Tuning Distributed Queries - Oracle 10g

The local Oracle Database server breaks the distributed query into a corresponding number of remote queries, which it then sends to the remote nodes for execution. The remote nodes execute the queries and send the results back to the local node. The local node then performs any necessary post-processing and returns the results to the user or application.

You have several options for designing your application to optimize query processing. This section contains the following topics:

  • Using Collocated Inline Views
  • Using Cost-Based Optimization
  • Using Hints
  • Analyzing the Execution Plan

Using Collocated Inline Views

The most effective way of optimizing distributed queries is to access the remote databases as little as possible and to retrieve only the required data.

For example, assume you reference five remote tables from two different remote databases in a distributed query and have a complex filter (for example, WHERE r1.salary + r2.salary > 50000). You can improve the performance of the query by rewriting the query to access the remote databases once and to apply the filter at the remote site. This rewrite causes less data to be transferred to the query execution site.

Rewriting your query to access the remote database once is achieved by using collocated inline views. The following terms need to be defined:

  • Collocated

    Two or more tables located in the same database.

  • Inline view

    A SELECT statement that is substituted for a table in a parent SELECT statement. The embedded SELECT statement, shown within the parentheses is an example of an inline view:


  • Collocated inline view

    An inline view that selects data from multiple tables from a single database only. It reduces the amount of times that the remote database is accessed, improving the performance of a distributed query.

    Oracle recommends that you form your distributed query using collocated inline views to increase the performance of your distributed query. Oracle Database cost-based optimization can transparently rewrite many of your distributed queries to take advantage of the performance gains offered by collocated inline views.

Using Cost-Based Optimization

In addition to rewriting your queries with collocated inline views, the cost-based optimization method optimizes distributed queries according to the gathered statistics of the referenced tables and the computations performed by the optimizer.
For example, cost-based optimization analyzes the following query. The example assumes that table statistics are available. Note that it analyzes the query inside a CREATE TABLE statement:

The alias v is assigned to the inline view, which can then be referenced as a table in the preceding SELECT statement. Creating a collocated inline view reduces the amount of queries performed at a remote site, thereby reducing costly network traffic.

How Does Cost-Based Optimization Work?

The main task of optimization is to rewrite a distributed query to use collocated inline views. This optimization is performed in three steps:

  1. All mergeable views are merged.
  2. Optimizer performs collocated query block test.
  3. Optimizer rewrites query using collocated inline views.

After the query is rewritten, it is executed and the data set is returned to the user.

While cost-based optimization is performed transparently to the user, it is unable to improve the performance of several distributed query scenarios. Specifically, if your distributed query contains any of the following, cost-based optimization is not effective:

  • Aggregates
  • Subqueries
  • Complex SQL

Setting Up Cost -Based Optimization
After you have set up your system to use cost-based optimization to improve the performance of distributed queries, the operation is transparent to the user. In other words, the optimization occurs automatically when the query is issued. You need to complete the following tasks to set up your system to take advantage of cost-based optimization:

  • Setting Up the Environment
  • Analyzing Tables

Setting Up the Environment To enable cost-based optimization, set the OPTIMIZER_ MODE initialization parameter to CHOOSE or COST. You can set this parameter by:

  • Modifying the OPTIMIZER _MODE parameter in the initialization parameter file
  • Setting it at session level by issuing an ALTER SESSION statement

Issue one of the following statements to set the OPTIMIZER _MODE initialization parameter at the session level:

Analyzing Tables For cost -based optimization to select the most efficient path for a distributed query, you must provide accurate statistics for the tables involved. You do this using the DBMS _STATS package or the ANALYZE statement.

The following DBMS _STATS procedures enable the gathering of certain classes of optimizer statistics:

  • GATHER_INDEX_STATS
  • GATHER_TABLE_STATS
  • GATHER_SCHEMA_STATS
  • GATHER_DATABASE_STATS

For example, assume that distributed transactions routinely access the scott.dept table. To ensure that the cost -based optimizer is still picking the best plan, execute the following:

Using Hints

If a statement is not sufficiently optimized, then you can use hints to extend the capability of cost-based optimization. Specifically, if you write your own query to utilize collocated inline views, instruct the cost -based optimizer not to rewrite your distributed query.

Additionally, if you have special knowledge about the database environment (such as statistics, load, network and CPU limitations, distributed queries, and so forth), you can specify a hint to guide cost -based optimization. For example, if you have written your own optimized query using collocated inline views that are based on your knowledge of the database environment, specify the NO_MERGE hint to prevent the optimizer from rewriting your query.

This technique is especially helpful if your distributed query contains an aggregate, subquery, or complex SQL. Because this type of distributed query cannot be rewritten by the optimizer, specifying NO _MERGE auses the optimizer to skip the steps described in "How Does Cost-Based Optimization Work?"

The DRIVING _SITE hint lets you define a remote site to act as the query execution site. In this way, the query executes on the remote site, which then returns the data to the local site. This hint is especially helpful when the remote site contains the majority of the data.

Using the NO _MERGE Hint

The NO _MERGE hint prevents the database from merging an inline view into a potentially noncollocated SQL statement. This hint is embedded in the SELECT statement and can appear either at the beginning of the SELECT statement with the inline view as an argument or in the query block that defines the inline view.

Typically, you use this hint when you have developed an optimized query based on your knowledge of your database environment.

Using the DRIVING _SITE Hint

The DRIVING _SITE hint lets you specify the site where the query execution is performed. It is best to let cost -based optimization determine where the execution should be performed, but if you prefer to override the optimizer, you can specify the execution site manually.

Following is an example of a SELECT statement with a DRIVING _SITE hint:

Analyzing the Execution Plan

An important aspect to tuning distributed queries is analyzing the execution plan. The feedback that you receive from your analysis is an important element to testing and verifying your database. Verification becomes especially important when you want to compare plans. For example, comparing the execution plan for a distributed query optimized by cost -based optimization to a plan for a query manually optimized using hints, collocated inline views, and other techniques.

Preparing the Database to Store the Plan

Before you can view the execution plan for the distributed query, prepare the database to store the execution plan. You can perform this preparation by executing a script. Execute the following script to prepare your database to store an execution plan:

SQL> @UTLXPLAN.SQL

After you execute utlxplan.sql, a table, PLAN_TABLE, is created in the current schema to temporarily store the execution plan.

Generating the Execution Plan

After you have prepared the database to store the execution plan, you are ready to view the plan for a specified query. Instead of directly executing a SQL statement, append the statement to the EXPLAIN PLAN FOR clause. For example, you can execute the following:

Viewing the Execution Plan

After you have executed the preceding SQL statement, the execution plan is stored temporarily in the PLAN _TABLE that you created earlier. To view the results of the execution plan, execute the following script:

@UTLXPLS.SQL

Executing the utlxpls.sql script displays the execution plan for the SELECT statement that you specified. The results are formatted as follows:

If you are manually optimizing distributed queries by writing your own collocated inline views or using hints, it is best to generate an execution plan before and after your manual optimization. With both execution plans, you can compare the effectiveness of your manual optimization and make changes as necessary to improve the performance of the distributed query.

To view the SQL statement that will be executed at the remote site, execute the following select statement:


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Oracle 10g Topics