Dropping an Index - Firebird

The DROP INDEX statement removes a user-defined index from the database.

Use DROP INDEX also when an index needs to have a change of structure: segments added, removed, or reordered, or the sort order altered. First use a DROP INDEX statement to delete the index, and then use a CREATE INDEX statement to re-create it, using the same name and the new characteristics.

This is the syntax:

DROP INDEX name;

The following statement deletes an index from the JOB table:

DROP INDEX MINSALX;

Restrictions

No user can drop an index except the user who created it, or SYSDBA, or (on POSIX) a user with root privileges.

System-defined indexes, created automatically on columns defined with UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints, cannot be dropped. It is necessary to drop the constraints in order to drop these indexes.

OPTIMIZATION TOPIC: Optimal Indexing

Unlike many other relational database systems, Firebird never has need of a full-time DBA with an armory of algorithms for keeping databases running smoothly. For the most part, well-kept Firebird databases just “keep on keeping on.”

Indexes do play an important part in the performance of a database. It is important to recognize that they are dynamic structures that, like moving parts in an engine, need to be “cleaned and lubed” from time to time. This section provides some guidelines for keeping your indexes working at full capacity.

Housekeeping Indexes

Indexes are binary structures that may become unbalanced after many changes to the database, especially if general database housekeeping is neglected. Indexes can be rebalanced and tuned in a number of ways to restore performance to optimal levels.

  • Rebuilding an index will restore the balance of its tree structure by removing entries made obsolete by deletions and redistributing branches created by successive insertions. The tool for switching the state of an index between active and inactive is the ALTER INDEX statement.
  • A complete rebuild of an index from scratch, by dropping and re-creating it in a pristine state, may improve the performance of an old index on a very large or dynamic table.
  • Restoring a database from a gbak backup also re-creates pristine indexes.

Improving Index Selectivity

Broadly, the selectivity of an index is an evaluation of the number of rows that would be selected by each index value in a search. A unique index has the highest possible selectivity, because it can never select more than one row per value, whereas an index on a Boolean has almost the lowest selectivity.

Indexing a column that, in production conditions, will store a single value predominantly, such as the Country of Birth item in the earlier election data example, is worse than not indexing the column at all. Firebird is quite efficient at building bitmaps for non-indexed sorts and searches.

Measuring Selectivity

The selectivity of a unique index is 1. All non-unique indexes have a value lower than 1.

Selectivity (s) is calculated as

s = n / number of rows in the table

where n is the number of distinct occurrences of the index value in the table. The smaller the number of distinct occurrences, the lower the selectivity. Indexes with higher selectivity perform better than those with low selectivity.

The Firebird optimizer looks up a factor for calculating selectivity when a table is first accessed and stores it in memory for use in calculating plans for subsequent queries on that table. Over time, the initially calculated factors on frequently updated tables gradually become outdated, perhaps affecting the optimizer’s index choices in extreme cases.

Recalculating Selectivity

Recalculating index selectivity updates a statistical factor stored in the system tables. The optimizer reads this just once when choosing a plan—it is not highly significant to its choices. Frequent, large DML operations do not necessarily spoil the distribution of distinct index key values. If the indexing is reasonable, the “demographics” of value distribution may change very little.

Knowing the most accurate selectivity of an index has its greatest value to the developer. It provides a metric to assist in determining the usefulness of the index.

If the efficacy of a plan degrades over time because of large numbers of inserts or changes to the key column(s) that change the distribution of index key values, the query may slow down gradually. Any index whose selectivity drops dramatically over time should be dropped because of its effect on performance.

Understanding and dealing with a rogue index that deteriorates with table growth, to the extent that it interferes with plans, is an important part of database tuning. However, the most crucial effect of using an index that intrinsically has very low selectivity has nothing to do with the optimizer and everything to do with index geometry.

Why Low Selectivity Is a Killer

Firebird builds binary trees for indexes. It stores these structures on index pages, which are nothing more special than pages that are allocated for storing index trees. Each distinct value in an index segment has its own node off the root of the tree. As a new entry is added to an index, it is either placed into a new node if its value does not already exist or stacked on top of any existing duplicate values.

Figure illustrates this binary mechanism in a simplified form.

Building a binary index tree

Building a binary index tree

When duplicate values arrive, they are slotted into the first node at the front of the “chain” of other duplicates —this is what is occurring with value “ghi” in the diagram. This structure is referred to as a duplicate chain.

Duplicate Chains

A duplicate chain, per se, is fine—all non-unique indexes have them. An update of the segment value, or the deletion of a row, is very costly when the chain of duplicates is very long. One of the worst things you can do to a Firebird database is define a table with a million rows, all with the same key value for a secondary index, and then delete all those rows. The last duplicate stored appears first in the list, and the first stored is last. The deletion ordinarily takes the first row stored first, then the second, and so on.The index walking code has to run through the whole duplicate chain for each deletion, always finding the entry it wants in the very last position. To quote Ann Harrison, “It churns the cache like nothing you ever saw.”

The cost of all that churning and roiling is never borne by the transaction that deletes or updates all the rows in a table. Updating a key value or deleting a row affects the index later, when the old back version is garbage collected. The cost goes to the next transaction that touches the row, following the completion of all of the transactions that were active when the update or delete occurred.

Index Toolkit

The standard Firebird installation provides a number of tools and tricks for querying the state of indexes and for keeping them in good shape:

  • For finding out about the selectivity and other meaningful characteristics of indexes, use gstat,a data statistics analyzer. Later in this chapter, we look at what gstat can tell you about your indexes.
  • The tool for recalculating index selectivity is the SET STATISTICS statement. SET STATISTICS does not rebuild indexes.
  • The best tool of all for cleaning up tired indexes is gbak, the backup and restore utility. Restoring a database from its latest backup will both rebuild all indexes and cause their selectivity to be freshly computed.

Using SET STATISTICS

In some tables, the number of duplicate values in indexed columns can increase or decrease radically as a result of the relative “popularity” of a particular value in the index when compared with other candidate values. For example, indexes on dates in a sales system might tend to become less selective when business starts to boom.

Periodically recomputing index selectivity can improve the performance of indexes that are subject to significant swings in the distribution of distinct values. SET STATISTICS recomputes the selectivity of an index. It is a statement that can be run from an isql interactive session or, in Embedded SQL (ESQL), can be passed from an application. To run this statement, you need to be connected to the database and logged in as the user that created the index, the SYSDBA user, or (on POSIX) a user with operating system root privileges.

This is the syntax:

SET STATISTICS INDEX name
;

The following statement recomputes the selectivity for an index in the employee.gdb database:

SET STATISTICS INDEX MINSALX;

On its own, SET STATISTICS will not cure current problems resulting from previous index maintenance that depended on obsolete selectivity statistics, since it does not rebuild the index.

Getting Index Statistics

Firebird provides a command-line tool that displays real-time statistical reports about the state of objects in a database. The tool produces a number of reports about what is going on in a database. Our main focus for this section is the index statistics. The other reports are described after the index reports.

gstat Command-Line Tool

You need to run gstat on the server machine since it is a completely local program that does not access databases as a client. Its default location is the /bin directory of your Firebird installation. It reports on a specified database and can be used by the SYSDBA or the owner of the database.

gstat on POSIX

Because gstat accesses database files at the filesystem level, it is necessary on Linux and UNIX platforms to have system-level read access to the file. You can achieve this in one of the following two ways:

  • Log in under the account that is running the server (by default, user firebird on v.1.5; root or interbas on v.1.0.x).
  • Set the system-level permissions on the database file to include read permission for your group.

The gstat Interface

Unlike some of the other command-line tools, gstat does not have its own shell interface. Each request involves calling gstat with switches.

This is the syntax:

gstat [switches] db_name

db _name is the fully qualified local path to the database you want to query.

Graphical Tools

Gstat is not user-friendly, and some GUI tools do a neater job of reproducing gstat’s output, generating the equivalent of the gstat switches though the Services API (“Services Manager”). Upcoming screenshots were taken using the open source IBOConsole utility.

Switches

Table lists the switches for gstat.

Table lists the switches for gstat.

It is recommended that you pipe the output to a text file and view it with a scrolling text editor.

The –index Switch

This is the syntax:

gstat -i[ndex] db_path_and_name

This switch retrieves and displays statistics about indexes in the database: average key length (bytes), total duplicates, and maximum duplicates for a single key. Include the s[ystem] switch if you would like the system indexes in the report.

Unfortunately, there is no way to get the stats for a single index, but you can restrict the output to a single table using the –t switch followed by the name of the table. You can supply a space-separated list of table names to get output for more than one table. If your table names are case sensitive—having been declared with double- quoted identifiers—then the –t switch argument[s] must be correct for case but not double-quoted. For tables with spaces in their names, gstat –t does not work at all.

You can add the switch –system to include the system indexes in the report.

To run it over the employee database and pipe it to a text file named gstat.index.txt, do the following.

On POSIX, type (all on one line)

On Win32, type (all on one line)

Figure shows how an index page summary is displayed.

Example of index page summary output

Example of index page summary output

What It All Means

A summary of the index information appears first. Table explains the entries, line by line:

gstat –i[ndex] Output

gstat –i[ndex] Output

Because gstat performs its analysis at the file level, it has no concept of transactions. Consequently, index statistics include information about indexes involved in non-committed transactions.

Index Depth

The index is a tree structure, with pages at one level pointing to pages at the next level, and so on, down to the pages that point to actual rows. The more depth, the more levels of indirection.

Leaf buckets are the bottom-level pages of the index, which point to 5 individual rows.

In Figure, the index root page (created when the database is created) stores a pointer for each index and an offset pointing to another page of pointers that contains pointers for that index. In turn, that page points to the pages containing the leaf data—the actual data of the nodes—either directly (depth=1) or indirectly (adding one level for each level of indirection).

Index depth

Index depth

Two factors affect depth: page size and key size. If the depth is larger than 3 and the page size is less than 8192, increasing the page size to 8192 or 16384 should reduce the levels of indirection and help the speed.

Analyzing Some Statistics

The following excerpts are gstat –index output from a badly performing database.

Analysis 1

The first is the supporting index that was automatically created for a foreign key:

Depth: 2, leaf buckets: 109, nodes: 73373 tells us that the bottom level of the index has 109 buckets (pages), for an index of 73373 nodes. That may not be the total row count of the table. For one thing, gstat is not aware of transactions, so it cannot tell whether it is looking at committed or uncommitted pages. For another, the column may have nulls, and they are not considered in the statistics.

The bottom level of the index—where leaf nodes are stored—has only 109 pages in it. That seems suspiciously few pages for so many rows. The next statistic tells us why.

In Average data length: 0.00, total dup: 73372, max dup: 32351, max dup is the length of the longest duplicate chain, accounting for almost half of the nodes. The total dup figure says that every node but one is duplicated.

This is a fairly classic case where the designer applies a foreign key without considering its distribution. It is probably a Boolean-style column, or perhaps a lookup table with either very few possible values or a practical bias toward a single value.

An example of this was an electoral register application that stored a column for country of residence. The electorate was approximately 3 million voters and registration was compulsory. The database supplied a COUNTRY table of more than 300 countries, keyed by the CCCIT country code. It was present on nearly every table in the database as a foreign key. The trouble was, nearly every elector lived in the same country.

The average data length is the average length of the key as stored. This has very little to do with the length as defined. An average length of zero only means that, by a process of suppression and compression, there is not enough “meat” left on which to calculate an average.

Fill distribution shows that all 109 pages are in the 80–99 percent range, which is well filled. The fill distribution is the proportion of space in each page of the index that is being used for data and pointers. Eighty to 90 percent is good. Lower fill distributions are the strongest clue that you should rebuild the index.

Analysis 2

The next example shows the statistics for the system-generated index for the primary key of the same table:

The key length of 10 indicates that some compression is done. This is normal and good. That one page is underfilled is quite normal—the number of nodes did not fit evenly into the pages.

Analysis 3

This database has a smaller table that holds data temporarily for validation. It is flushed and reloaded periodically. The following statistics were generated for a foreign key on that table:

Total dup and max dup are identical—every row has the same value in the index key. Selectivity does not get worse than that. The fill level is very low in all pages, suggesting spotty deletions. If this were not such a small table, it would be an index from hell.

The table—a work-in-progress queue—is very dynamic, storing up to 1000 new rows per day. After validation, the rows are moved to production tables and the rows in the WIP table were being deleted, causing the system to slow down. Frequent backup and restore of the database was necessary to get things moving.

The problem was that foreign keys should have been avoided. In such cases, if referential integrity constraints are deemed essential, they can be user-defined in triggers.

However, if database design absolutely requires foreign key constraints for volatile tables on columns with low selectivity, there are recommended ways to mitigate the overhead and index page deterioration caused by constantly deleting and repopulating the table. Monitor the fill level on the problem indexes and take action whenever it drops below 40 percent. The choice of action depends on your requirements:

  • First choice, if possible, is to delete all the rows at once rather than deleting them one by one, at random. Drop the foreign key constraint, delete the rows, and commit that transaction. Re-create the constraint. As long as there are no long-running transactions inhibiting garbage collection, the new index will be completely empty.
  • If the deletions must be incremental, choose a time to get exclusive access and use ALTER INDEX to rebuild the index. It will be faster and more predictable than incremental garbage collection on huge duplicate chains.

Other gstat Switches

The gstat statistics can provide useful information about other activities in databases.

The –header Switch

This line:

gstat -header db_path_and_name

displays a database summary showing the header page information. Figure shows an example.

Example of gstat header page summary output

Example of gstat header page summary output

The first line displays the name and location of the primary database file. The following lines contain information from the database header page. Table describes the output.

gstat–h[eader] Output

gstat–h[eader] Output

The –data Switch

This line

gstat -data db_path_and_name

retrieves a table-by-table database summary displaying information about data pages. To include the system tables (RDB$XXX) in the report, add the –system switch. Figure shows an example of the output.

Example of gstat data page summary output

Example of gstat data page summary output

The command-line output is similar:

For each table in the database, the figures outlined in Table are displayed.

gstat –d[ata] Output

gstat –d[ata] Output

Restricting the Output from gstat –data

If you do not want a data report for every table, the –t switch allows you to specify a list of tables in which you are interested.

The syntax is

gstat -data db_path_and_name -t TABLENAME1 [TABLENAME2 [TABLENAME3 ..]]

The –r[ecords] Switch

This line

gstat -r db_path_and_name

retrieves and displays record size and version statistics.

  • For rows: Average row length (bytes) and total rows in the table
  • For back versions: Average version length (bytes), total versions in the table, and maximum version chain for a record

The total number of rows reported in a table could include both active and dead transactions. Record and version length are applicable to the actual user data—they do not take into account the header that prefixes every record version.

Example

The three tables CAULDRON, CAULDRON1, and CAULDRON2 have identical metadata and cardinality of 100,000 records. The nominal, uncompressed record length is ~900 bytes.

CAULDRON is a clean table of rows with no back versions. The average stored row length is ~121 bytes—about an 87-percent compression efficiency.

CAULDRON1 has an active transaction that just executed:

DELETE FROM CAULDRON1;

Every row has a zero (0.00) length because the primary record is a delete stub that contains only a row header. The committed records were all restored as back versions and compressed to the same size they had when they were the primary records. The table still has the same number of pages (4,000) as before the DELETE operation. The average fill factor rose from 85 to 95 percent to house all the delete stubs.

CAULDRON2 has an active transaction that just executed:

UPDATE CAULDRON2 SET F2OFFSET = 5.0

The updated records have each grown by 2 bytes (121 to 123), attributable to lower compression.

The value 5.0 replaced many missing and zero values, which made the field value different from its neighboring fields. There are now 100,000 back versions averaging 10 bytes each. The average fill factor has risen to 99 percent and the table grew 138 pages to 4,138.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics