Primary Optimizations MySQL

This section discusses the most important optimizations performed by the server.

Optimizing Constant Relations

Constant Propagation

A transformation takes place for expressions like this:

For such expressions, since it is known that, if A=B and B=C then A=C (the Transitivity Law), the transformed condition becomes:

This transformation occurs for column1 <operator> column2 conditions if and only if <operator> is one of these operators: =, <, >, <=, >=, <>, <=>, LIKE. That is, transitive transformations don't apply for BETWEEN. Probably they should not apply for LIKE either, but that's a story for another day.

Constant propagation happens in a loop, so the output from one propagation step can be input for the next step.

See: /sql/sql_select.cc, change_cond_ref_to_const(). Or See: sql/sql_select.cc, propagate_cond_constants().

Eliminating "Dead" Code

A transformation takes place for conditions that are always true, for example:

In this case, the first condition is removed, leaving

A transformation also takes place for conditions that are always false. For example, consider this WHERE clause:

Since the parenthesized part is always false, it is removed, reducing this expression to

In some cases, where the WHERE clause represents an impossible condition, the optimizer might eliminate it completely. Consider the following:

Because it is never possible for this condition to be true, the EXPLAIN statement will show the words Impossible WHERE. Informally, we at MySQL say that the WHERE has been “optimized awayâ€.

If a column cannot be NULL, the optimizer removes any non-relevant IS NULL conditions. Thus, WHERE not_null_column IS NULL is an always-false situation, and WHERE not_null_column IS NOT NULL is an always-true situation — so such columns are also eliminated from the conditional expression. This can be tricky. For example, in an OUTER JOIN, a column which is defined as NOT NULL might still contain a NULL. The optimizer leaves IS NULL conditions alone in such exceptional situations.

The optimizer will not detect all Impossible WHERE situations — there are too many possibilities in this regard. For example:

The optimizer will not eliminate the condition in the query, even though the CREATE TABLE definition makes it an impossible condition.

Folding of Constants

A transformation takes place for this expression:

Before you say, "but I never would write 1 + 2 in the first place"; remember what was said earlier about constant propagation. It is quite easy for the optimizer to put such expressions together. This process simplifies the result.

Constants and Constant Tables

A MySQL constant is something more than a mere literal in the query. It can also be the contents of a constant table, which is defined as follows:

  1. A table with zero rows, or with only one row
  2. A table expression that is restricted with a WHERE condition, containing expressions of the form column = constant, for all the columns of the table's primary key, or for all the columns of any of the table's unique keys (provided that the unique columns are also defined as NOT NULL).

For example, if the table definition for Table0 contains

These rules mean that a constant table has at most one row value. MySQL will evaluate a constant table in advance, to find out what that value is. Then MySQL will “plug” that value into the query. Here's an example:

When evaluating this query, MySQL first finds that table Table1 — after restriction with Table1. unique_not_null_column — is a constant table according to the second definition above. So it retrieves that value. If the retrieval fails (there is no row in the table with unique_not_null_column = 5), then the constant table has zero rows and you will see this message if you run EXPLAIN for the statement:Impossible WHERE noticed after reading const tables Alternatively, if the retrieval succeeds (there is exactly one row in the table with unique_not_null_column = 5), then the constant table has one row and MySQL transforms the query to this:

Actually this is a grand-combination example. The optimizer does some of the transformation because of constant propagation, which we described earlier. By the way, we described constant propagation first because it happens happens before MySQL figures out what the constant tables are. The sequence of optimizer steps sometimes makes a difference.

Although many queries have no constant-table references, it should be kept in mind that whenever the word constant is mentioned hereafter, it refers either to a literal or to the contents of a constant table.

See: /sql/sql_select.cc, make_join_statistics().

Optimizing Joins

This section discusses the various methods used to optimize joins.

Determining the Join Type

When evaluating a conditional expression, MySQL decides what join type the expression has. (Again: despite the word “join”, this applies for all conditional expressions, not just join expressions. A term like “access type” would be clearer.) These are the documented join types, in order from best to worst:

  • system: a system table which is a constant table
  • const: a constant table
  • eq_ref: a unique or primary index with an equality relation
  • ref: an index with an equality relation, where the index value cannot be NULL
  • ref_or_null: an index with an equality relation, where it is possible for the index value to be NULL
  • range: an index with a relation such as BETWEEN, IN, >=, LIKE, and so on.
  • index: a sequential scan on an index.
  • ALL: a sequential scan of the entire table

See: /sql/sql_select.h, enum join_type{}. Notice that there are a few other(undocumented) join types too, for subqueries.

The optimizer can use the join type to pick a driver expression. For example, consider this query:

Since indexed_column has a better join type, it is more likely to be the driver. You'll see various exceptions as this description proceeds, but this is a simple first rule.

What is significant about a driver? Consider that there are two execution paths for the query:

The "Bad" Execution Plan: Read every row in the table. (This is called a sequential scan of Table1 or simply table scan.) For each row, examine the values in indexed_column and in unindexed_ column, to see if they meet the conditions.

The "Good" Execution Plan: Via the index, look up the rows which have indexed_column = 5.

(This is called an indexed search.) For each row, examine the value in unindexed_column to see if it meets the condition.

An indexed search generally involves fewer accesses than a sequential scan, and far fewer accesses if the table is large but the index is unique. That is why it is better to access with the “good” execution plan, and why it is often good to choose indexed_column as the driver.

Joins and Access Methods

Bad join choices can cause more damage than bad choices in single-table searches, so MySQL developers have spent proportionally more time making sure that the tables in a query are joined in an optimal order and that optimal access methods (often called access paths) are chosen to retrieve table data.

A combination of a fixed order in which tables are joined and the corresponding table access methods for each table is called query execution plan (QEP). The goal of the query optimizer is to find an optimal QEP among all such possible plans. There are several general ideas behind join optimization.

Each plan (or part of plan) is assigned a cost. The cost of a plan reflects roughly the resources needed to compute a query according to the plan, where the main factor is the number of rows that will be accessed while computing a query. Once we have a way to assign costs to different QEPs we have a way to compare them. Thus, the goal of the optimizer is to find a QEP with minimal cost among all possible plans.

In MySQL, the search for an optimal QEP is performed in a bottom-up manner. The optimizer first considers all plans for one table, then all plans for two tables, and so on, until it builds a complete optimal QEP. Query plans that consist of only some of the tables (and predicates) in a query are called partial plans. The optimizer relies on the fact that the more tables that are added to a partial plan, the greater its cost. This allows the optimizer to expand with more tables only the partial plans with lower cost than the current best complete plan.

The key routine that performs the search for an optimal QEP is sql/sql_select.cc, find_best(). It performs an exhaustive search of all possible plans and thus guarantees it will find an optimal one.

Below we represent find_best() in an extremely free translation to pseudocode. It is recursive, so some input variables are labeled “so far†to indicate that they come from a previous iteration.

Here the optimizer applies a depth-first search algorithm. It performs estimates for every table in theFROM clause. It will stop a search early if the estimate becomes worse than the best estimate so far. Theorder of scanning will depend on the order that the tables appear in the FROM clause.

See: /sql/table.h, struct st_table.

ANALYZE TABLE may affect some of the factors that the optimizer considers.

See also: /sql/sql_sqlect.cc, make_join_statistics().

The straightforward use of find_best() and greedy_search() will not apply for LEFT JOINor RIGHT JOIN. For example, starting with MySQL 4.0.14, the optimizer may change a left join to astraight join and swap the table order in some cases. See also LEFT JOIN and RIGHT JOIN Optimization[http://dev.mysql. com/doc/refman/5.1/en/left-join-optimization.html].

The range Join Type

Some conditions can work with indexes, but over a (possibly wide) range of keys. These are known as range conditions, and are most often encountered with expressions involving these operators: >, >=,<, <=, IN, LIKE, BETWEEN To the optimizer, this expression:

That is, there is no range search if the first character in the pattern is a wildcard.

The optimizer may change a Range to an ALL join type if a condition would examine too many index keys. Such a change is particularly likely for < and > conditions and multiple-level secondary indexes.

See: (for MyISAM indexes) /myisam/mi_range.c, mi_records_in_range().

The index Join Type

Consider this query:

If column1 is indexed, then the optimizer may choose to retrieve the values from the index rather than from the table. An index which is used this way is called a covering index in most texts. MySQL simply uses the word “index†in EXPLAIN descriptions.

For this query:

the optimizer will use join type = index only if the index has this definition:

In other words, all columns in the select list must be in the index. (The order of the columns in the index does not matter.) Thus it might make sense to define a multiple-column index strictly for use as a covering index, regardless of search considerations.

The Index Merge Join Type

Index Merge is used when table condition can be converted to form:

The conditions for conversion are that each cond_i can be used for a range scan, and no pair (cond_i, cond_j) uses the same index. (If cond_i and cond_j use the same index, then cond_i OR cond_j can be combined into a single range scan and no merging is necessary.)

For example, Index Merge can be used for the following queries:

Index Merge is implemented as a “container” for range key scans constructed from cond_i conditions. When doing Index Merge, MySQL retrieves rows for each of the keyscans and then runs them through a duplicate elimination procedure. Currently the Unique class is used for duplicate elimination.

Index Merge Optimizer

A single SEL_TREE object cannot be constructed for conditions that have different members of keys in the OR clause, like in condition:

Beginning with MySQL 5.0, these conditions are handled with the Index Merge method, and its range optimizer structure, class SEL_IMERGE. SEL_IMERGE represents a disjunction of several SEL_TREE objects, which can be expressed as:

where each of t_i stands for a SEL_TREE object, and no pair (t_i, t_j) of distinct SEL_TREE objects can be combined into single SEL_TREE object.

The current implementation builds SEL_IMERGE only if no single SEL_TREE object can be built for the part of the query condition it has analyzed, and discards SEL_TREE immediately if it discovers that a single SEL_TREE object can be constructed. This is actually a limitation, and can cause worse row retrieval strategy to be used. E.g. for query:

scan on badkey will be chosen even if Index Merge on (goodkey1, goodkey) would be faster.

The Index Merge optimizer collects a list of possible ways to access rows with Index Merge. This list of SEL_IMERGE structures represents the following condition:

where t_ij is one SEL_TREE and one line is for one SEL_IMERGE object. The SEL_IMERGE object with minimal cost is used for row retrieval.

In sql/opt_range.cc, see imerge_list_and_list(), imerge_list_or_list(), and SEL_IMERGE class member functions for more details of Index Merge construction.

See the get_index_merge_params function in the same file for Index Merge cost calculation algorithm.

The range Optimizer

For range queries, the MySQL optimizer builds a SEL_TREE object which represents a condition in this form:

Each of cond_key_i is a condition that refers to components of one key. MySQL creates a cond_key_i condition for each of the usable keys. Then the cheapest condition cond_key_i is used for doing range scan.

A single cond_key_i condition is represented by a pointer-linked network of SEL_ARG objects. Each SEL_ARG object refers to particular part of the key and represents the following condition:

  1. is for an interval, possibly without upper or lower bound, either including or not including bound- ary values.
  2. is for a SEL_ARG object with condition on next key component.
  3. is for a SEL_ARG object with an interval on the same field as this SEL_ARG object. Intervals between the current and "left"objects are disjoint and
    left_sel_arg_cond.sup_val <=inf_val.
  4. is for a SEL_ARG object with an interval on the same field as this SEL_ARG object. Intervals between the current and "right "objects are disjoint and left_sel_arg_cond.min_val >=max_val.

MySQL is able to convert arbitrary-depth nested AND-OR conditions to the above conjunctive form.

Row Retrieval Algorithm

Index Merge works in two steps:

Preparation step:

Row retrieval step:

See: sql/opt_range.cc, QUICK_INDEX_MERGE_SELECT class members for Index Merge row retrieval code.

Transpositions

MySQL supports transpositions (reversing the order of operands around a relational operator) for simple

Transpositions to expressions of the form column = constant are ideal for index lookups. If an expression of this form refers to an indexed column, then MySQL always uses the index, regardless of the table size. (Exception: If the table has only zero rows or only one row, it is a constant table and receives special treatment. See Section 3.2.1.4, “Constants and Constant Tablesâ€.)

AND Relations

An ANDed search has the form condition1 AND condition2, as in this example:

Here, the optimizer's decision process can be described as follows:

  • If (neither condition is indexed) use sequential scan.
  • Otherwise, if (one condition has better join type) then pick a driver based on join type (see Section
  • Otherwise, since (both conditions are indexed and have equal join type) pick a driver based on the first index that was created.

The optimizer can also choose to perform an index_merge index intersection, as described in Index Merge Optimization
[http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html].

Here's an example:

When choosing a strategy to solve this query, the optimizer picks s2 = 5 as the driver because the index for s2 was created first. Regard this as an accidental effect rather than a rule — it could change at any moment.

OR Relations

An ORed search has the form condition1 OR condition2, as in this example:

Here the optimizer's decision is to use a sequential scan. There is also an option to use index merge under such circumstances. Merge Optimizer†and Index Merge Optimization. The above warning does not apply if the same column is used in both conditions. For example:

In such a case, the search is indexed because the expression is a range search. This subject will be revisited during the discussion of the IN predicate.

UNION Queries

All SELECT statements within a UNION are optimized separately. Therefore, for this query:

if both column1 and column2 are indexed, then each SELECT is done using an indexed search, and the result sets are merged. Notice that this query might produce the same results as the query used in the OR example, which uses a sequential scan.

NOT (<>) Relations

However, MySQL does not transform in this circumstance. If you think that a range search would be better, then you should do your own transforming in such cases. It is also a logical rule that

However, MySQL does not transform in this circumstance either.We expect to add optimizations for both the previous cases.

ORDER BY Clauses

In general, the optimizer will skip the sort procedure for the ORDER BY clause if it sees that the rows will be in order anyway. But let's examine some exceptional situations.

For the query:

the optimizer will throw out the ORDER BY clause. This is another example of dead code elimination.

For the query:

the optimizer will use an index on column1, if it exists.

For the query:

the optimizer will use an index on column1, if it exists. But don't let that fool you! The index is only for finding the values. (It's cheaper to do a sequential scan of the index than a sequential scan of the table, that's why index is a better join type than ALL — “The index Join Typeâ€.)

There will still be a full sort of the results.

For the query:

if both column1 and column2 are indexed, the optimizer will choose an index on ... column1. The fact that ordering takes place by column2 values does not affect the choice of driver in this case.

ORDER BY Optimization [http: //dev. mysql. com /doc/ refman/ 5.1/ en/ order- by- optimization. html], provides a description of the internal sort procedure which we will not repeat here, but urge you to read, because it describes how the buffering and the quicksort mechanisms operate.

GROUP BY and Related Conditions

These are the main optimizations that take place for GROUP BY and related items (HAVING, COUNT(), MAX(), MIN(), SUM(), AVG(), DISTINCT()).

  • GROUP BY will use an index, if one exists.
  • GROUP BY will use sorting, if there is no index. The optimizer may choose to use a hash table.
  • For the case GROUP BY x ORDER BY x, the optimizer will realize that the ORDER BY is unnecessary, because the GROUP BY comes out in order by x.
  • The optimizer contains code for shifting certain HAVING conditions to the WHERE clause; however,this code is not operative at time of writing. See: /sql/sql_select.cc, JOIN::optimize(), after #ifdef HAVE_REF_TO_FIELDS.
  • If the table handler has a quick row-count available, then the query

gets the count without going through all the rows. This is true for MyISAM tables, but not for InnoDB tables. Note that the query

is not subject to the same optimization, unless column1 is defined as NOT NULL.

  • New optimizations exist for MAX() and MIN(). For example, consider the query

If column1 is indexed, then it's easy to find the highest value by looking for 'a' in the index and going back to the key before that.

  • The optimizer transforms queries of the form

if and only if both of these conditions are true:

  • The GROUP BY can be done with an index. (This implies that there is only one table in the
    FROM clause, and no WHERE clause.)
  • There is no LIMIT clause.

Because DISTINCT is not always transformed to GROUP BY, do not expect that queries with DISTINCT will always cause ordered result sets. (You can, however, rely on that rule with GROUP BY,unless the query includes ORDER BY NULL.)See: /sql/sql_select.cc, opt_sum_query(), and /sql/sql_select.cc, remove_duplicates().


Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MySQL Topics