SQL - TIPS, TRICKS AND TOOLS - IBM Mainframe

In the previous sec 3 chapters we have diseased about the Structure Query Language (SQL) as it is applicable to DB2 in great detail. But as you must have already realized, the same results can be achieved through different SQL. For example you can do a 'SELECT *...' to get all the columns or you can name the columns you want. You can use an 'INSERT INTO...column1, column 2 ... VALUES value 1, value2, ...' or you can omit the column name if you are updating the entire row. Any particular method of coding SQL statement is not wrong, but sometimes some are better, better in terms of performance and clarity, than others. In this chapter we will discuss some of the methods of writing more efficient and better SQLs. But it must be always kept in mind that all the methods suggested below would hold true in all the cases. Some of them should be used with care and only after considering all other implications.

Efficient Data Retrieval Methods

Given below are some of the SQL access guidelines, which will improve the performance. But all of them may not be applicable in all cases. Choose them judiciously and test them before using them in your applications.

  • Pre-test all Embedded' SQLs. Before embedding an SQL in an application program test them using SPUFI or QMF. This will reduce the time and effort spend on testing and debugging an application program as a whole, because once you are sure that the SQL part of the program is thoroughly tested and debugged you can concentrate on the rest of the program.
  • Use EXPLAIN. Use the EXPLAIN facility provided by DB2 to get further knowledge into the performance potential of each SQL in an application. When EXPLAIN analysis is done on an SQL or application plan, information about the access strategy chosen by the DB2 optimizer is provided. This information can be got from the table named the PLANTABLE. The analyst can query the PLAN_TABLE and determine the efficiency of the SQL.
  • Use Query Analyser Tools. Utilize all the available analysis tools like the Computer Associates' Pro-optimize or Platinum Technology's Plan Analyzer. These tools will analyze the SQL and provide clear and detailed description about the access paths chosen by the DB2 optimizer, why it is chosen, alternative solutions available, etc.
  • Never use SELECT *. Never, ever in a query ask DB2 for more than required. 'Not a penny more Not a penny less'. Each query should access only the columns needed for the function that will be performed. Following this will result in maximum flexibility and efficiency. But if all the columns are required then, there is no harm in using SELECT *. But if SELECT * is used every time even if all the columns- are not required then it will prevent the DB2 optimizer from choosing the best access path. But sometimes it becomes necessary to use SELECT * for reducing the number of SQL in a system. For example there is a table and different application programs want one or two of its columns and the number of SQL created is very large, then at the cost of performance SELECT * can be chosen, because we will be reducing the number of SQLs considerably.
  • Never use a Cursor for fetching a single row. Because for using cursors you will have to DECLARE, OPEN, FETCH and CLOSE them. Singleton SELECT is much faster. But if you want to update a row after it is being selected then use Cursor FOR UPDATE, because it ensures data integrity by causing the DB2 to have an exclusive lock on the page containing the row to be updated.
  • Use FOR FETCH ONLY. When a SELECT statement is coded always use FOR FETCH ONLY clause. This enables DB2 to block fetch! which returns the fetched rows more efficiently. QMF automatically appends FOR FETCH ON.LY to SELECT statements. Static SQLs embedded in application programs automatically uses block fetch if the BIND process determines it to be feasible.
  • Avoid using DISTINCT. The DISTINCT verb removes the duplicates from the result table. If duplicates are not a problem and if the chances of having a duplicate do not exist then avoid using it because it adds to the overheads.
  • Limit the data selected. Return the minimum number of rows needed by the application program. Always use WHERE clause effectively and do not code generic SELECT statements.
  • Code Predicates on Indexed Columns. DB2 performs more efficiently when using an index. But even though this is true in most cases, there are exceptions like, when you request most of the rows in the table or access is done by a non clustered index, when an indexed access can result in poor performance than non indexed access.
  • Use ORDER BY when sequence is important, because if ORDER BY is not specified there is no guaranty that the rows returned will be the required sequence.
  • Limit the columns specified in the ORDER BY clause. When ORDER BY is used, DB2 ensures that data is>sorted in order by the specified columns. This involves a sort operation, if there is not an appropriate index. So the more number of columns specified in the ORDER BY clause the poorer will be the performance.
  • Use Equivalent Data Types. Use the same data type and lengths when comparing column values to host variables or literals. This eliminates the need for data conversion. For example to compare a CHAR(6) column with a CHAR(5) column, DB2 has to do the data conversion and when DB2 has to convert data the available indexes are not used. Same is the case when there is mismatch in length or data type between a column and a host variable that are being compared.
  • Use BETWEEN instead of <= and >=. The BETWEEN predicate is more efficient than the 'greater/less than or equal to' predicates because the optimizer selects the most efficient path for the BETWEEN predicate.
  • Use IN instead of LIKE. Use IN or BETWEEN instead of LIKE whenever possible. If you know that only a certain number of occurrences exist, using IN with the specific list will be much more efficient than LIKE. You can imitate the functionality of LIKE by using a range of values. For example if you want to get the name of all the employees starting with A instead of coding, emp-name LIKE 'A%', you can code something like 'emp-name BETWEEN 'AAAAAA' AND 'AZZZZZ' provided the name is 6 charters long. If LIKE is used then avoid using '%' and '-' signs at the beginning of the comparison string, which prevents DB2 from using a matching, index. But the LIKE predicate can produce efficient results when the '%' or '-'sign is used at the end or middle of the comparison string.
  • Avoid using NOT. Whenever possible avoid the use of NOT (<>). Instead reword the SQL so that the same results are obtained. For example use 'WHERE sum > 100' instead of 'WHERE sum <> 100'.
  • Code the most restrictive predicate first. When coding the predicates in the WHERE clause choose the most restrictive one first, or in other words choose the condition that will eliminate most of the rows as the first predicate. For example if you want to get the names all the persons whose name starts with the letter 'M' in 'Bombay' city, instead of coding 'WHERE name BETWEEN 'MAAAAA' AND 'MZZZZZ' AND city = 'Bombay', code 'WHERE city = 'Bombay' AND name BETWEEN 'MAAAAAAyD 'MZZZZZ".
  • Do not use Arithmetic expressions in a predicate. If you use an arithmetic expression as a predicate then DB2 will not use an index for a column that is included in the expression. Perform the calculations before the SQL statement and then use the result in the query.
  • Use date and time arithmetic with care. Here the rule that we have discussed in chapter 3 should be kept in mind.
  • Specify^ Functions to format Timestamp. When possible use functions like date, day, time, etc. to reduce the length of the timestamp columns.
  • Limit the use of scalar functions. If you can avoid scalar functions do so, but use them to off­load work form application program to the database management system. Indexes will not be used for which scalar functions are applied.
  • Use UNION ALL instead of UNION. The UNION operator always results in a sort. So in cases where there is no possibility of duplicates or in case where elimination of duplicates is unnecessary use UNION ALL instead of UNION.
  • Use NOT EXISTS instead if NOT IN. When you are coding a subquery using negation logic, use NOT EXISTS instead of NOT IN to increase the efficiency of the SQL. This is because when you are using NOT EXISTS, DB2 must verify only the non-existence, but when you are using NOT IN, DB2 must materialize *he complete result set.
  • Use a constant for existence checking. When using EXISTS to test for the existence of a particular row, specify a constant in the subquery SELECT list. The SELECT list of the subquery is unimportant because the statement is checking for existence only and will not return the columns. For example, code SQL to list all the suppliers who are supplying at least one part as follows:
  • Efficient Data Retrieval Methods

  • Minimise the number of tables in a Join. Even though DB2 allows as much as 15 tables to be joined try to minimize the number. As a general rule try to limit the number of tables to be joined fewer than 5.
  • Reduce the number of rows to be joined using predicates in the join. Only the rows that are actually required should be retrieved.
  • Join using SQL instead of Program logic. Whatever that can be achieved through a join should not be tried using a high level language because it will always be much more expensive in terms of resources.
  • Use joins instead of subqueries. A join can be more efficient than a subquery or a correlated subquery using IN. This is illustrated in the examples in the section dealing with join and subqueries.
  • Join on indexed columns. The efficiency of your programs improves when tables are joined based on indexed columns rather than non-indexed ones.
  • Avoid Cartesian products.
  • Limit the columns Grouped. When you are using the GROUP BY clause specify only those columns that need to be grouped.

Efficient Data Modification Methods

You can modify data using the update statements, namely INSERT, UPDATE and DELETE. In this section we will discuss some of the guidelines, which will improve the performance of your update operations.

  • Limit updating indexed columns. When columns in indexes are updated, a corresponding update is applied to all indexes in which the column participates. This will reduce the performance considerably because of additional I/O overhead.
  • Use FOR UPDATE OF correctly. Specify only those columns need to be updated in the FOR UPDATE OF column list. DB2 will not use any index that contains columns listed in the FOR UPDATE OF clause.
  • Consider using DELETE/INSERT instead of FOR UPDATE OF. If all columns in a row are being updated then it is always better to delete the old row and insert a new row rather than using FOR UPDATE OF clause. This give DB2 an opportunity to use an index, whereas with the FOR UPDATE Or* clause, DB2 would have to do a tablespace scan with an 'RR' lock on the affected table.
  • Direct UPDATE Vs UPDATE using WHERE CURRENT OF. If the data does not have to be retrieved by the application before the update use the direct SQL UPDATE statement instead of the UPDATE using WHERE CURRENT OF option. This is because a Cursor update with WHERE CURRENT OF option perform worse than a direct update because in the first case all the rows have to be fetched and updated a row at a time, whereas the direct UPDATE affects multiple rows with a single statement. Moreover, when using a cursor you must add the overhead of DECLARE, OPEN and CLOSE statements.
  • UPDATE only changed columns. UPDATE statements should specify only the columns for which the values will be modified. This is because DB2 will check each host variable to see whether there is a data change and only if there is a data change the columns will get modified. So unnecessary adding columns will result in poor performance because DB2 will have to do the checking for all columns specified.
  • Consider dropping indexes before large insertions. When you insert a large number of rows into a single table, every index must be updated with the columns and row-ids (RIDs) for each inserted row. For very large insertions, this can make the indexes unorganized and can result in very poor performance. It may be more efficient to drop all indexes, perform the insertions and then re-create the indexes. But before doing this, pros and cons of both methods must be considered.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Mainframe Topics