DBRMS, BIND, PLANS, AND THE OPTIMIZER - IBM Mainframe

In DB2, the host language's procedural code is separated from the SQL code during a pre-compilation process. Each is compiled separately (the SQL compilation is called BIND) and then rejoined at execution time. In this section we will take a brief look at the components involved in this process.

As you can see in the following figure, the input to the Precompiler is the program source code. There are two outputs—a modified source module and a Data Base Request Module (DBRM). The modified source module contains host language calls to DB2, which the Precompiler inserts in place of SQL statements, which it comments out. The DBRM consists of the SQL statements from the program.

The modified source module is input to compile and link-edit steps to produce a load module, while the DBRM is input to a BIND step to produce a compiled set of run-time structures called an application plan. The application plan is stored in a system-controlled area called the Directory. Thespian contains information about the DB2 resources and paths required to execute the SQL statements (data, indexes, etc.). The application plan and load module work together at execution time.

Although we will not describe the BIND process fully at this point, we will mention one part of it called optimization. We said that SQL statements justify only the data that you want, not how to get to it. The Optimizer performs automatic access path selection as part of the BIND process. It considers both the available access paths (indexes, sequential reads, etc.) and system-held statistics on the data to be accessed (the size of the table, the number of distinct values in a particular column, etc.). From this, it chooses what it considers to be the most efficient access path for each query. These access paths are "bound" into the application plan. It is worth mentioning that although we listed automatic access path selection as one of DB2's advantages, it is also a double-edged sword because at times you will wish you could control the choice of access path. While you can influence the Optimizer's choices, you can never control it.

Application Program Preparation and Execution

Application Program Preparation and Execution

DB2 Directory and Catalog

Information about the DB2 subsystem is maintained in DB2's Directory and Catalog. These two components differ in that the Directory is kept solely for DB2*s internal use, while the Catalog, which contains descriptive information about the plans (e.g., creation date), may be accessed by both DB2 and DB2 users.

DB2's Catalog contains approximately 30 tables, which are central to DB2's functioning. These are sometimes called system tables to distinguish them from user tables. DB2 uses the Catalog to determine access paths, check authorizations, validate BIND requests, etc. In addition, DB2 users may use SQL to query the Catalog. For example, if you are a database administrator, you may look in SYSIBM.SYSINDEXES, which provides information for monitoring and tuning the system's indexes. Catalog tables can also be quite useful to programmers who can find, in SYSIBM.SYSCOLUMNSV for example, the length and data type of each column in an existing table. However, some installations do not grant the authority to read these tables to application programmers for fear of downgrading the system's overall performance.

The SQL statements used to get information from these system tables are the same as those used to get information from user tables. The difference is that you cannot update these tables directly with SQL, but you are able to update user tables. The only exception to this is the feature that allows you to update specific Catalog columns used for performance tuning. In general, system tables are modified as a result of definition statements, which create the database, authorization statements which grant privileges to users of the database, the BIND process and DB2 utilities. For example, when you create a table, a row is inserted in the Catalog table that maintains a list of all DB2 tables(called SYSIBM.SYSTABLES). The columns of SYSIBM.SYSTABLES include the name of the table and the user ID of the table's creator. Likewise, removing a table from the database causes the deletion of the row that describes it in SYSIBM.SYSTABLES.

DB2 keeps track of privileges and levels of authorization for different users. When you instruct DB2 to BIND a DBRM into an application plan, a row is inserted into the SYSIBM.SYSPLANAUTH table. A column in that table specifies that you have the authority to execute the plan that you just created. The execution of several DB2 utilities also updates the Catalog tables. The RUNSTATS utility gathers statistics about the database and updates tables in the Catalog accordingly. The Optimizer uses the Catalog to check these statistics when choosing an access path.

More about Bind...

The Bind operation requires a more detailed treatment. The Precompiler removes all the SQL statements it finds in the source code and replaces them with equivalent host language CALL statements. Then it uses these SQL statements to build a Database Request Module (DBRM) for the program. The DBRM consists of an edited form of the original SQL statement, together with some additional information. The function of the Bind is to convert the high-level DBRMs into an optimized internal form. Its input is a DBRM and output is a package, which is stored in the DB2 directory. Bind performs two major jobs syntax checking and optimization.

Bind examines the SQL statements in the DBRM and checks for any syntax errors. Even though the precompiler has already done such a check, Bind performs it again because precompiler is de-coupled from DB2 and an invalid DBRM can be created via some other means or the DBRM can get corrupted after it was generated.

The second function of Bind is optimization. Bind includes an optimizer as a sub component. The optimizer chooses for each SQL statement an optimal access strategy for implementing that statement. From the different strategies available, the optimizer chooses the most optimal one based on the factors like:

  • How many tables are referenced in the SQL statement?
  • What is the size of each table, in terms of number of columns and rows?
  • What are the indexes?
  • How selective those indexes are?
  • How the data is physically clustered on the disk?
  • The form of the WHERE clause in the request

In addition to the above said functions Bind also performs an authorization checking, i.e., whether the user who is to be the owner of the bound package. Is allowed to perform the operations requested in the DBRM to be bound.

Automatic Recompilation

Since DB2 is a compiling system, i.e., the Database Requests are compiled by Bind into internal form, from the performance point of view it is always faster than other database systems. But one significant drawback of this is that the decisions made by the compiler or Bind at compilation time may no longer valid at execution time. For example an index that was chosen as the optimum path during the Bind process get dropped before execution. Usually this will throw the entire program execution into disarray. But DB2 has a facility to counter this problem, Automatic recompilation/rebinding.

When an index is dropped, DB2 examines the catalog to find out which packages are dependent on that index. Any such package will be marked 'invalid'. When the runtime supervisor retrieves such a package for execution, seeing the 'invalid' mark invokes Bind to produce a new package or in other words chose the next available optimum access path. After the recompilation the new packages replaces the old one and the process continues as if not" interrupted. The only indication to the user is a slight delay in the execution of the SQL statement. Thus DB2 provides a high degree of physical data independence, i.e., the users and user programs are not dependent on the physical structure of the stored database. So the physical structure can be changed without having to make any changes in the application programs. Automatic recompilation will not work in all cases, for example if the table itself is dropped and a new one is created with totally different structure, then automatic recompilation will fail. Another point is that automatic recompilation is done if the optimal path is no longer valid, for example if the index is dropped. But DB2 will not perform an automatic recompilation if an index is added. This is because there is guarantee that the recompilation will be profitable.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Mainframe Topics