The DROP INDEX statement removes a user-defined index from the database.
Use DROP INDEX also when an index needs to have a change of structure: segments added, removed, or reordered, or the sort order altered. First use a DROP INDEX statement to delete the index, and then use a CREATE INDEX statement to re-create it, using the same name and the new characteristics.
This is the syntax:
DROP INDEX name;
The following statement deletes an index from the JOB table:DROP INDEX MINSALX;
No user can drop an index except the user who created it, or SYSDBA, or (on POSIX) a user with root privileges.
System-defined indexes, created automatically on columns defined with UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints, cannot be dropped. It is necessary to drop the constraints in order to drop these indexes.
OPTIMIZATION TOPIC: Optimal Indexing
Unlike many other relational database systems, Firebird never has need of a full-time DBA with an armory of algorithms for keeping databases running smoothly. For the most part, well-kept Firebird databases just “keep on keeping on.”
Indexes do play an important part in the performance of a database. It is important to recognize that they are dynamic structures that, like moving parts in an engine, need to be “cleaned and lubed” from time to time. This section provides some guidelines for keeping your indexes working at full capacity.
Indexes are binary structures that may become unbalanced after many changes to the database, especially if general database housekeeping is neglected. Indexes can be rebalanced and tuned in a number of ways to restore performance to optimal levels.
Improving Index Selectivity
Broadly, the selectivity of an index is an evaluation of the number of rows that would be selected by each index value in a search. A unique index has the highest possible selectivity, because it can never select more than one row per value, whereas an index on a Boolean has almost the lowest selectivity.
Indexing a column that, in production conditions, will store a single value predominantly, such as the Country of Birth item in the earlier election data example, is worse than not indexing the column at all. Firebird is quite efficient at building bitmaps for non-indexed sorts and searches.
The selectivity of a unique index is 1. All non-unique indexes have a value lower than 1.
Selectivity (s) is calculated as
s = n / number of rows in the table
where n is the number of distinct occurrences of the index value in the table. The smaller the number of distinct occurrences, the lower the selectivity. Indexes with higher selectivity perform better than those with low selectivity.
The Firebird optimizer looks up a factor for calculating selectivity when a table is first accessed and stores it in memory for use in calculating plans for subsequent queries on that table. Over time, the initially calculated factors on frequently updated tables gradually become outdated, perhaps affecting the optimizer’s index choices in extreme cases.
Recalculating index selectivity updates a statistical factor stored in the system tables. The optimizer reads this just once when choosing a plan—it is not highly significant to its choices. Frequent, large DML operations do not necessarily spoil the distribution of distinct index key values. If the indexing is reasonable, the “demographics” of value distribution may change very little.
Knowing the most accurate selectivity of an index has its greatest value to the developer. It provides a metric to assist in determining the usefulness of the index.
If the efficacy of a plan degrades over time because of large numbers of inserts or changes to the key column(s) that change the distribution of index key values, the query may slow down gradually. Any index whose selectivity drops dramatically over time should be dropped because of its effect on performance.
Understanding and dealing with a rogue index that deteriorates with table growth, to the extent that it interferes with plans, is an important part of database tuning. However, the most crucial effect of using an index that intrinsically has very low selectivity has nothing to do with the optimizer and everything to do with index geometry.
Why Low Selectivity Is a Killer
Firebird builds binary trees for indexes. It stores these structures on index pages, which are nothing more special than pages that are allocated for storing index trees. Each distinct value in an index segment has its own node off the root of the tree. As a new entry is added to an index, it is either placed into a new node if its value does not already exist or stacked on top of any existing duplicate values.
Figure illustrates this binary mechanism in a simplified form.
Building a binary index tree
When duplicate values arrive, they are slotted into the first node at the front of the “chain” of other duplicates —this is what is occurring with value “ghi” in the diagram. This structure is referred to as a duplicate chain.
A duplicate chain, per se, is fine—all non-unique indexes have them. An update of the segment value, or the deletion of a row, is very costly when the chain of duplicates is very long. One of the worst things you can do to a Firebird database is define a table with a million rows, all with the same key value for a secondary index, and then delete all those rows. The last duplicate stored appears first in the list, and the first stored is last. The deletion ordinarily takes the first row stored first, then the second, and so on.The index walking code has to run through the whole duplicate chain for each deletion, always finding the entry it wants in the very last position. To quote Ann Harrison, “It churns the cache like nothing you ever saw.”
The cost of all that churning and roiling is never borne by the transaction that deletes or updates all the rows in a table. Updating a key value or deleting a row affects the index later, when the old back version is garbage collected. The cost goes to the next transaction that touches the row, following the completion of all of the transactions that were active when the update or delete occurred.
The standard Firebird installation provides a number of tools and tricks for querying the state of indexes and for keeping them in good shape:
Using SET STATISTICS
In some tables, the number of duplicate values in indexed columns can increase or decrease radically as a result of the relative “popularity” of a particular value in the index when compared with other candidate values. For example, indexes on dates in a sales system might tend to become less selective when business starts to boom.
Periodically recomputing index selectivity can improve the performance of indexes that are subject to significant swings in the distribution of distinct values. SET STATISTICS recomputes the selectivity of an index. It is a statement that can be run from an isql interactive session or, in Embedded SQL (ESQL), can be passed from an application. To run this statement, you need to be connected to the database and logged in as the user that created the index, the SYSDBA user, or (on POSIX) a user with operating system root privileges.
This is the syntax:SET STATISTICS INDEX name
The following statement recomputes the selectivity for an index in the employee.gdb database:SET STATISTICS INDEX MINSALX;
On its own, SET STATISTICS will not cure current problems resulting from previous index maintenance that depended on obsolete selectivity statistics, since it does not rebuild the index.
Getting Index Statistics
Firebird provides a command-line tool that displays real-time statistical reports about the state of objects in a database. The tool produces a number of reports about what is going on in a database. Our main focus for this section is the index statistics. The other reports are described after the index reports.
gstat Command-Line Tool
You need to run gstat on the server machine since it is a completely local program that does not access databases as a client. Its default location is the /bin directory of your Firebird installation. It reports on a specified database and can be used by the SYSDBA or the owner of the database.
gstat on POSIX
Because gstat accesses database files at the filesystem level, it is necessary on Linux and UNIX platforms to have system-level read access to the file. You can achieve this in one of the following two ways:
The gstat Interface
Unlike some of the other command-line tools, gstat does not have its own shell interface. Each request involves calling gstat with switches.
This is the syntax:
gstat [switches] db_name
db _name is the fully qualified local path to the database you want to query.
Gstat is not user-friendly, and some GUI tools do a neater job of reproducing gstat’s output, generating the equivalent of the gstat switches though the Services API (“Services Manager”). Upcoming screenshots were taken using the open source IBOConsole utility.
Table lists the switches for gstat.
It is recommended that you pipe the output to a text file and view it with a scrolling text editor.
The –index Switch
This is the syntax:gstat -i[ndex] db_path_and_name
This switch retrieves and displays statistics about indexes in the database: average key length (bytes), total duplicates, and maximum duplicates for a single key. Include the– s[ystem] switch if you would like the system indexes in the report.
Unfortunately, there is no way to get the stats for a single index, but you can restrict the output to a single table using the –t switch followed by the name of the table. You can supply a space-separated list of table names to get output for more than one table. If your table names are case sensitive—having been declared with double- quoted identifiers—then the –t switch argument[s] must be correct for case but not double-quoted. For tables with spaces in their names, gstat –t does not work at all.
You can add the switch –system to include the system indexes in the report.
To run it over the employee database and pipe it to a text file named gstat.index.txt, do the following.
On POSIX, type (all on one line)
On Win32, type (all on one line)
Figure shows how an index page summary is displayed.
Example of index page summary output
What It All Means
A summary of the index information appears first. Table explains the entries, line by line:
gstat –i[ndex] Output
Because gstat performs its analysis at the file level, it has no concept of transactions. Consequently, index statistics include information about indexes involved in non-committed transactions.
The index is a tree structure, with pages at one level pointing to pages at the next level, and so on, down to the pages that point to actual rows. The more depth, the more levels of indirection.
Leaf buckets are the bottom-level pages of the index, which point to 5 individual rows.
In Figure, the index root page (created when the database is created) stores a pointer for each index and an offset pointing to another page of pointers that contains pointers for that index. In turn, that page points to the pages containing the leaf data—the actual data of the nodes—either directly (depth=1) or indirectly (adding one level for each level of indirection).
Two factors affect depth: page size and key size. If the depth is larger than 3 and the page size is less than 8192, increasing the page size to 8192 or 16384 should reduce the levels of indirection and help the speed.
Analyzing Some Statistics
The following excerpts are gstat –index output from a badly performing database.
The first is the supporting index that was automatically created for a foreign key:
Depth: 2, leaf buckets: 109, nodes: 73373 tells us that the bottom level of the index has 109 buckets (pages), for an index of 73373 nodes. That may not be the total row count of the table. For one thing, gstat is not aware of transactions, so it cannot tell whether it is looking at committed or uncommitted pages. For another, the column may have nulls, and they are not considered in the statistics.
The bottom level of the index—where leaf nodes are stored—has only 109 pages in it. That seems suspiciously few pages for so many rows. The next statistic tells us why.
In Average data length: 0.00, total dup: 73372, max dup: 32351, max dup is the length of the longest duplicate chain, accounting for almost half of the nodes. The total dup figure says that every node but one is duplicated.
This is a fairly classic case where the designer applies a foreign key without considering its distribution. It is probably a Boolean-style column, or perhaps a lookup table with either very few possible values or a practical bias toward a single value.
An example of this was an electoral register application that stored a column for country of residence. The electorate was approximately 3 million voters and registration was compulsory. The database supplied a COUNTRY table of more than 300 countries, keyed by the CCCIT country code. It was present on nearly every table in the database as a foreign key. The trouble was, nearly every elector lived in the same country.
The average data length is the average length of the key as stored. This has very little to do with the length as defined. An average length of zero only means that, by a process of suppression and compression, there is not enough “meat” left on which to calculate an average.
Fill distribution shows that all 109 pages are in the 80–99 percent range, which is well filled. The fill distribution is the proportion of space in each page of the index that is being used for data and pointers. Eighty to 90 percent is good. Lower fill distributions are the strongest clue that you should rebuild the index.
The next example shows the statistics for the system-generated index for the primary key of the same table:
The key length of 10 indicates that some compression is done. This is normal and good. That one page is underfilled is quite normal—the number of nodes did not fit evenly into the pages.
This database has a smaller table that holds data temporarily for validation. It is flushed and reloaded periodically. The following statistics were generated for a foreign key on that table:
Total dup and max dup are identical—every row has the same value in the index key. Selectivity does not get worse than that. The fill level is very low in all pages, suggesting spotty deletions. If this were not such a small table, it would be an index from hell.
The table—a work-in-progress queue—is very dynamic, storing up to 1000 new rows per day. After validation, the rows are moved to production tables and the rows in the WIP table were being deleted, causing the system to slow down. Frequent backup and restore of the database was necessary to get things moving.
The problem was that foreign keys should have been avoided. In such cases, if referential integrity constraints are deemed essential, they can be user-defined in triggers.
However, if database design absolutely requires foreign key constraints for volatile tables on columns with low selectivity, there are recommended ways to mitigate the overhead and index page deterioration caused by constantly deleting and repopulating the table. Monitor the fill level on the problem indexes and take action whenever it drops below 40 percent. The choice of action depends on your requirements:
Other gstat Switches
The gstat statistics can provide useful information about other activities in databases.
The –header Switch
gstat -header db_path_and_name
displays a database summary showing the header page information. Figure shows an example.
Example of gstat header page summary output
The first line displays the name and location of the primary database file. The following lines contain information from the database header page. Table describes the output.
The –data Switch
gstat -data db_path_and_name
retrieves a table-by-table database summary displaying information about data pages. To include the system tables (RDB$XXX) in the report, add the –system switch. Figure shows an example of the output.
Example of gstat data page summary output
The command-line output is similar:
For each table in the database, the figures outlined in Table are displayed.
gstat –d[ata] Output
Restricting the Output from gstat –data
If you do not want a data report for every table, the –t switch allows you to specify a list of tables in which you are interested.
The syntax isgstat -data db_path_and_name -t TABLENAME1 [TABLENAME2 [TABLENAME3 ..]]
The –r[ecords] Switch
This linegstat -r db_path_and_name
retrieves and displays record size and version statistics.
The total number of rows reported in a table could include both active and dead transactions. Record and version length are applicable to the actual user data—they do not take into account the header that prefixes every record version.
The three tables CAULDRON, CAULDRON1, and CAULDRON2 have identical metadata and cardinality of 100,000 records. The nominal, uncompressed record length is ~900 bytes.
CAULDRON is a clean table of rows with no back versions. The average stored row length is ~121 bytes—about an 87-percent compression efficiency.
CAULDRON1 has an active transaction that just executed:DELETE FROM CAULDRON1;
Every row has a zero (0.00) length because the primary record is a delete stub that contains only a row header. The committed records were all restored as back versions and compressed to the same size they had when they were the primary records. The table still has the same number of pages (4,000) as before the DELETE operation. The average fill factor rose from 85 to 95 percent to house all the delete stubs.
CAULDRON2 has an active transaction that just executed:UPDATE CAULDRON2 SET F2OFFSET = 5.0
The updated records have each grown by 2 bytes (121 to 123), attributable to lower compression.
The value 5.0 replaced many missing and zero values, which made the field value different from its neighboring fields. There are now 100,000 back versions averaging 10 bytes each. The average fill factor has risen to 99 percent and the table grew 138 pages to 4,138.
Firebird Related Interview Questions
|RDBMS Interview Questions||MySQL Interview Questions|
|Linux Interview Questions||Mac OS X Deployment Interview Questions|
|Windows Administration Interview Questions||Windows Server 2003 Interview Questions|
|SQL Interview Questions||NoSQL Interview Questions|
|Advanced C++ Interview Questions|
All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.