Other Optimizations MySQL

In this section, we discuss other, more specialized optimizations performed in the MySQL server.

NULLs Filtering for ref and eq_ref Access

This section discusses the NULLs filtering optimization used for ref and eq_ref joins.

Early NULLs Filtering

Suppose we have a join order such as this one:

The original equality can be checked only after we've read the current rows of both tables tblX and tblY. The IS NOT NULL predicate can be checked after we've read the current row of table tblX. If there are any tables in the join order between tblX and tblY, the added IS NOT NULL check will allow us to skip accessing those tables. This feature is implemented in these places in the server code: The ref analyzer (contained in such functions as update_ref_and_keys()) detects and marks equalities like that shown above by setting KEY_FIELD::null_rejecting=TRUE. After the join order has been chosen, add_not_null_conds() adds appropriate IS NOT NULL predicates to the conditions of the appropriate tables. It is possible to add IS NOT NULL predicates for all equalities that could be used for ref access (and not for those that are actually used). However, this is currently not done.

Late NULLs Filtering

Suppose we have a query plan with table tblX being accessed via the ref access method:

tblX.key_part1 = expr1 AND tblX.key_part2 = expr2 AND ... Before performing an index lookup, we determine whether any of the expri values is NULL. If it is, we don't perform the lookup, but rather immediately return that the matching tuple is not found.

This optimization reuses the null_rejecting attribute produced by the early NULLs filtering code join_read_always_key().

Partitioning-Related Optimizations

This section discussions optimizations relating to MySQL Partitioning. See Partitioning [http://dev.mysql.com/doc/refman/5.1/en/partitioning.html] for general information about the partitioning implementation in MySQL 5.1 and later.

Partition pruning

The operation of partition pruning is defined as follows:

“Given a query over partitioned table, match the table DDL against any WHERE or ON clauses, and find the minimal set of partitions that must be accessed to resolve the query.”

The set of partitions thus obtained (hereafter referred to as “used”) can be smaller then the set of all table partitions. Partitions that did not get into this set (that is, those that were pruned away) will not be accessed at all: this is how query execution is made faster.

Non-Transactional Table Engines. With non-transactional tables such as MyISAM, locks are placed on entire partitioned table. It is theoretically possible to use partition pruning to improve concurrency by placing locks only on partitions that are actually used, but this is currently not implemented. Partition pruning doesn't depend on what table engine is used. Therefore its implementation is a part of the MySQL Query Optimizer. The next few sections provide a detailed description of partition pruning.

Partition Pruning Overview

Partition pruning is performed using the following steps:

  1. Analyze the WHERE clause and construct an interval graph describing the results of this analysis.
  2. Walk the graph, and find sets of partitions (or subpartitions, if necessary) to be used for each interval in the graph.
  3. Construct a set of partitions used for the entire query.

The description represented by the interval graph is structured in a “bottom-up”fashion. In the discussion that follows, we first define the term partitioning interval,then describe how partitioning interval are combined to make an interval graph, and then describe the graph “walking” process.

Partitioning Intervals-Single-Point Intervals

Let's start from simplest cases. Suppose that we have a partitioned table with N columns, using partitioning type p_type and the partitioning function p_func, represented like this:

We can calculate p_func(const1, const2 ... constN) and discover which partition can contain records matching the WHERE clause. Note that this process works for all partitioning types and all partitioning functions.

Note

This process works only if the WHERE clause is of the exact form given above — that is,each column in the table must be tested for equality with some arbitrary constant (not necessarily the same constant for each column). For example, if col1=const1 were missing from the example WHERE clause, then we would not be able to calculate the partitioning function value and so would be unable to restrict the set of partitions to those actually used.

Interval Walking

Let a partitioned table t be defined with a set of column definitions columns, a partitioning type p_type using a partitioning function p_func taking an integer column int_col, as shown here:

Now suppose that we have a query whose WHERE clause is of the form WHERE const1 <= int_col <= const2

We can reduce this case to a number of cases of single-point intervals by converting the WHERE clause into the following relation:

In the source code this conversion is referred to as interval walking. Walking over short intervals is not very expensive, since we can reduce the number of partitions to scan to a small number. However, walking over long intervals may not be very efficient — there will be lots of numbers to examine, and we are very likely to out that all partitions need to be scanned.

The threshold for interval walking is determined by

Note

The logic of the previous example also applies for a relation such as this one:

Interval mapping

Let a partitioned table t be defined as follows:

Suppose we have a query on table t whose WHERE clause is of one of the forms shown here:

  • const1 <= t.column <= const2
  • t.column <= const2
  • const1 <= t.column

Since the partitioning function is ascending, the following relationship holds:

Using A and B to denote the leftmost and rightmost parts of this relation, we can rewrite it like this:

Note

In this instance, the interval is closed and has two bounds. However, similar inferences can be performed for other kinds of intervals.

For RANGE partitioning, each partition occupies one interval on the partition function value axis, and the intervals are disjoint, as shown here:
p0 p1 p2
table partitions ------x------x--------x-------->
search interval ----x==============x----------->
A B

A partition needs to be accessed if and only if its interval has a non-empty intersection with the search interval [A, B].

For LIST partitioning, each partition covers a set of points on the partition function value axis. Points produced by various partitions may be interleaved, as shown here:

p0p1p2 p1 p1 p0
table partitions --+---+----+----+----+----+---->
search interval ----x===================x------>
A B

A partition needs to be accessed if it has at least one point in the interval [A, B]. The set of partitions used can be determined by running from A to B and collecting partitions that have their points within this range.

Subpartitioning Intervals

In the previous sections we've described ways to infer the set of used partitions from "elementary" WHERE clauses. Everything said there about partitions also applies to subpartitions (with exception that subpartitioning by RANGE or LIST is currently not possible). Since each partition is subpartitioned in the same way, we'll find which subpartitions should be accessed within each partition.

From WHERE Clauses to Intervals

Previous sections deal with inferring the set of partitions used from WHERE clauses that represent partitioning or subpartitioning intervals. Now we look at how MySQL extracts intervals from arbitrary WHERE clauses.

The extraction process uses the Range Analyzer — a part of the MySQL optimizer that produces plans for the range access method. This is because the tasks are similar. In both cases we have a WHERE clause as input: the range access method needs index ranges (that is, intervals) to scan; partition pruning module needs partitioning intervals so that it can determine which partitions should be used.

For range access, the Range Analyzer is invoked with the WHERE clause and descriptions of table indexes.Each index is described by an ordered list of the columns which it covers:

For partition pruning, Range Analyzer is invoked with the WHERE clause and a list of table columns used by the partitioning and subpartitioning functions:

The result of the Range Analyzer's work is known as a SEL_ARG graph. This is a complex (and not yet fully documented) structure, which we will not attempt to describe here. What's important for the current discussion is that we can walk over it and collect partitioning and subpartitioning intervals.

The following example illustrates the structure and the walking process. Suppose a table t is partitioned as follows:

Now suppose that a query on table t has a highly complex WHERE clause, such as this one:

The SEL_ARG graph for this is shown here:
SEL_ARG graph
In the previous diagram, vertical edges (|) represent OR and the horizontal ones (-) represent AND (the line with both horizontal and vertical segments also represents AND).

The partition-pruning code walks the graph top to bottom and from left to right, making these inferences:

  1. Start with an empty set of used partitions at the topmost and leftmost interval.
  2. a. Perform interval analysis for pf=1; find a corresponding set of partitions P0; move right.
    b. Move right again, to sp2=40.
    c. Analyze the interval sp1='foo' AND sp2=40 interval; find that it covers rows in some subpartition SP1. Make first inference: within each partition making up set P0, mark subpartition SP1 as “used”.
    d. Move down to sp2=50.
    e. Analyze the interval sp1='foo' AND sp2=50, finding that it covers rows in some subpartition SP2. Make another inference: within each partition of set P0, mark subpartition SP2 as used.
    f. Move back to pf=1, and then down to pf=3.
  3. a. Perform interval analysis for pf=3; find a corresponding set of partitions P1; move right.
    b. Move right again, to sp2=33.
    c. Analyze the interval sp1='foo' AND sp2=33, find that it covers rows in a subpartition SP3. Make another inference: within each partition from set P1, mark subpartition SP3 as “used”.
    d. Move back to pf=3, then down to pf=4.
  4. a. Perform interval analysis for pf=4; find a corresponding set of partitions P2; move right.
    b. Perform moves and inferences analogous to what we did to the right of pf=3. There is some potential inefficiency due to the fact that that we will analyze the interval for sp1='foo' AND sp2=33 again, but this should not have much impact on overall performance.
    c. Move back to pf=3, then down to pf=8.
  5. a. Perform interval analysis for pf=8; find a corresponding set of partitions P3, move right.
    b. Now we've arrived at sp1='baz', and find that we can't move any further to the right and can't construct a subpartitioning interval. We remember this, and move back to pf=8.
    c. In the previous step we could not limit the set of subpartitions, so we make this inference: for every partition in set P3, assume that all subpartitions are active, and mark them as such.
  6. Try to move down from pf=8; find that there is nothing there; this completes the graph analysis.

Note

In certain cases the result of the RANGE optimizer will be several SEL_ARG graphs that are to be combined using OR or AND operators. This happens for WHERE clauses which either are very complicated or do not allow for the construction of a single list of intervals. In such cases, the partition pruning code takes apprpriate action, an example being this query:

No single list of intervals can be constructed in this instance, but the partition pruning code
correctly infers that the set of partitions used is a union of:
1. All subpartitions within the partition containing rows with partition_id=10; and a subpartition containing rows with subpartition_id=20 within each partition.

Partition Pruning in the Source Code

Here is a short walkthrough of what is where in the code:

  • sql/opt_range.cc:
    This file contains the implementation of what is described “From WHERE Clauses to Intervals”. The entry point is the function prune_partitions().

    There are also detailed code-level comments about partition pruning; search for PartitionPruningModule to find the starting point.

  • sql/partition_info.h:
  • sql/sql_partition.cc:
    This file contains the functions implementing all types of partitioning interval analysis.

Partition selection

If a partitioned table is accessed in a series of index lookups (that is, using the ref, eq_ref, or methods), MySQL checks to see whether it needs to make index lookups in all partitions or that it can limit access to a particular partition. This is performed for each index lookup.

Consider this example:

In the index_read() call, the partitioned table handler will discover that the value of all partitioning columns (in this case, the single column b) is fixed, and find a single partition to access. If this partition was pruned away, then no partitions will be accessed at all.



Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MySQL Topics