Considerations for Sorting - Firebird

Although ordering and aggregation are two operations with distinctly different outcomes, they interact to some degree when both are used in a query, and placement of their specifying clauses is important. Under the hood they share some characteristics with respect to forming intermediate sets and using indexes.

Presentation Order of Sorting Clauses

The following abridged SELECT syntax skeleton shows the position of ORDER BY and GROUP BY clauses in an ordered or grouped specification. Both clauses are optional and both may be present:

Temporary Sort Space

Queries with ORDER BY or GROUP BY clauses “park” the intermediate sets for sorting operations in temporary storage space. It is recommended that you have storage available that is approximately 2.5 times the size of the largest table you will sort. Firebird 1.5 and higher versions can configure sort space in RAM; all versions need to have temporary disk space to use for these operations.

Sort Memory

Version 1.5 and higher set the default block size of sort memory at 1MB. This is the size of each chunk of RAM that the server will allocate, up to a default maximum of 64MB on Superserver and 8MB on Classic server. Both of these values can be reconfigured by means of the configuration parameters SortMemBlockSize and SortMemUpperLimit, respectively, in firebird.conf.

Sort Space on Disk

If sort space on disk is not allocated, the engine will store the sort files in the /tmp filesystem on POSIX or the directory pointed to by the environment variables TMP and/or TEMP on Windows.

You can explicitly configure sort space in two ways. The first is to set up a directory location using the environment variable FIREBIRD_TMP (INTERBASE_TMP for v.1.0.x). The second is to configure the directories using the configuration parameter TempDirectories in firebird.conf for v.1.5 and higher, or to add one or more temp_directory entries in isc_config (POSIX) or ibconfig (Windows) for v.1.0.x.

The default installation does not configure any explicit sort space on disk.

Indexing

Ordered sets are costly to server resources and to performance in general. When evaluating a query and determining a plan, the optimizer has to choose between three methods of accessing the sets of data (known as streams) that are contributed by the specified tables: NATURAL (search in no particular order), INDEX (use an index to control the search), and MERGE (form bitmap images of the two streams and merge them on a one-to-one basis).

When a set is read in index order, the read order is contrived (i.e., it is in out-of-storage order). The likelihood that indexed reads will involve processing multiple pages is very high, and the cost grows with the size of the table. In the worst case, I/O operations will increase.

With the MERGE (also referred to as SORT) method, each row (and thus each page) is read only once and the reads are in storage order. In Firebird 1.5, which is more likely to use memory for sorts, the MERGE method is quicker in many cases.

The importance of good indexing to speed up and rationalize the sorting process in ordered queries is rightly emphasized. In ordered sets, if conditions are right, the INDEX method with a good index will speed the output of the first rows. However, the index can also slow down fetching considerably. The cost may be that the whole set is retrieved more slowly than when the alternative MERGE access method is used. With MERGE, the first row is found more slowly but the whole set is retrieved faster.

The same trade-off does not necessarily apply to grouped queries, since they require the full set to be fetched at the server before any row can be output. With grouping conditions, it may transpire that an index worsens, rather than improves, performance.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics