Indexing and Hashing - Database system concepts

Many queries reference only a small proportion of the records in a file. For example,a query like “Find all accounts at the Perryridge branch” or “Find the balance of account number A-101” references only a fraction of the account records.It is inefficient for the system to read every record and to check the branch-name field for the name “Perryridge,” or the account-number field for the value A-101.Ideally, the system should be able to locate these records directly. To allow these forms of access, we design additional structures that we associate with files.

Basic Concepts

An index for a file in a database system works in much the same way as the index in this textbook. If we want to learn about a particular topic (specified by a word or a phrase) in this textbook, we can search for the topic in the index at the back of the book, find the pages where it occurs, and then read the pages to find the information we are looking for. The words in the index are in sorted order, making it easy to find the word we are looking for. Moreover, the index is much smaller than the book, further reducing the effort needed to find the words we are looking for.

Card catalogs in libraries worked in a similar manner (although they are rarely used any longer). To find a book by a particular author,we would search in the author catalog, and a card in the catalog tells us where to find the book. To assist us in searching the catalog, the library would keep the cards in alphabetic order by authors, with one card for each author of each book.

Database system indices play the same role as book indices or card catalogs in libraries. For example, to retrieve an account record given the account number, the database system would look up an index to find on which disk block the corresponding record resides, and then fetch the disk block, to get the account record. Keeping a sorted list of account numbers would not work well on very large databases with millions of accounts, since the index would itself be very big; further, even though keeping the index sorted reduces the search time, finding an account can still be rather time-consuming. Instead, more sophisticated indexing techniques may be used.We shall discuss several of these techniques in this chapter.

There are two basic kinds of indices:

  1. Ordered indices. Based on a sorted ordering of the values.
  2. Hash indices. Based on a uniform distribution of values across a range of buckets. The bucket to which a value is assigned is determined by a function, called a hash function.

We shall consider several techniques for both ordered indexing and hashing. No one technique is the best. Rather, each technique is best suited to particular database applications. Each technique must be evaluated on the basis of these factors:

  • Access types: The types of access that are supported efficiently. Access types can include finding records with a specified attribute value and finding records whose attribute values fall in a specified range.
  • Access time: The time it takes to find a particular data item, or set of items, using the technique in question.
  • Insertion time: The time it takes to insert a new data item. This value includes the time it takes to find the correct place to insert the new data item, as well as the time it takes to update the index structure.
  • Deletion time: The time it takes to delete a data item. This value includes the time it takes to find the item to be deleted, as well as the time it takes to update the index structure.
  • Space overhead: The additional space occupied by an index structure. Provided that the amount of additional space is moderate, it is usually worth while to sacrifice the space to achieve improved performance.

We often want to have more than one index for a file. For example, libraries maintained several card catalogs: for author, for subject, and for title. An attribute or set of attributes used to look up records in a file is called a search key. Note that this definition of key differs from that used in primary key, candidate key, and superkey. This duplicate meaning for key is (unfortunately) well established in practice. Using our notion of a search key, we see that if there are several indices on a file, there are several search keys.

Ordered Indices

To gain fast random access to records in a file, we can use an index structure. Each index structure is associated with a particular search key. Just like the index of a book or a library catalog, an ordered index stores the values of the search keys in sorted order, and associates with each search key the records that contain it. The records in the indexed filemay themselves be stored in some sorted order, just as books in a library are stored according to some attribute such as the Dewey decimal number. A file may have several indices, on different search keys. If the file containing the records is sequentially ordered, a primary index is an index whose search key also defines the sequential order of the file. (The term primary index is sometimes used to mean an index on a primary key. However, such usage is nonstandard and should be avoided.) Primary indices are also called clustering indices. The search key of a primary index is usually the primary key, although that is not necessarily so. Indices whose search key specifies an order different from the sequential order of the file are called secondary indices, or nonclustering indices.

Sequential file for account records.

Sequential file for account records.

Primary Index

In this section, we assume that all files are ordered sequentially on some search key. Such files, with a primary index on the search key, are called index-sequential files. They represent one of the oldest index schemes used in database systems. They are designed for applications that require both sequential processing of the entire file and random access to individual records.

Figure shows a sequential file of account records taken from our banking example. In the example of Figure , the records are stored in search-key order, with branch-name used as the search key

Dense and Sparse Indices

An index record, or index entry, consists of a search-key value, and pointers to one or more records with that value as their search-key value. The pointer to a record consists of the identifier of a disk block and an offset within the disk block to identify the record within the block.

There are two types of ordered indices that we can use:

  • Dense index: An index record appears for every search-key value in the file. In a dense primary index, the index record contains the search-key value and a pointer to the first data record with that search-key value. The rest of the records with the same search key-value would be stored sequentially after the first record, since, because the index is a primary one, records are sorted on the same search key. Dense index implementations may store a list of pointers to all records with the same search-key value; doing so is not essential for primary indices.
  • Sparse index: An index record appears for only some of the search-key values.

As is true in dense indices, each index record contains a search-key value and a pointer to the first data record with that search-key value. To locate a record, we find the index entry with the largest search-key value that is less than or equal to the search-key value for which we are looking.We start at the record pointed to by that index entry, and follow the pointers in the file until we find the desired record. Figures show dense and sparse indices, respectively, for the account file. Suppose that we are looking up records for the Perryridge branch. Using the dense index of Figure, we follow the pointer directly to the first Perryridge record. We process this record, and follow the pointer in that record to locate the next record in search-key (branch-name) order. We continue processing records until we encounter a record for a branch other than Perryridge. If we are using the sparse index , we do not find an index entry for “Perryridge.” Since the last entry (in alphabetic order) before “Perryridge” is “Mianus,” we follow that pointer.We then read the account file in sequential order until we find the first Perryridge record, and begin processing at that point.

As we have seen, it is generally faster to locate a record if we have a dense index rather than a sparse index. However, sparse indices have advantages over dense indices in that they require less space and they impose less maintenance overhead for insertions and deletions. There is a trade-off that the system designer must make between access time and space overhead. Although the decision regarding this trade-off depends on the specific application, a good compromise is to have a sparse index with one index entry per block. The reason this design is a good trade-off is that the dominant cost in processing a database request is the time that it takes to bring a block from disk into main memory. Once we have brought in the block, the time to scan the entire block is negligible. Using this sparse index, we locate the block containing the record that we are seeking. Thus, unless the record is on an overflow block , we minimize block accesses while keeping the size of the index (and thus, our space overhead) as small as possible.

For the preceding technique to be fully general, we must consider the case where records for one search-key value occupy several blocks. It is easy to modify our scheme to handle this situation.

Dense and Sparse Indices

Dense index.

Sparse index.

Sparse index.

Multilevel Indices

Even if we use a sparse index, the index itself may become too large for efficient processing. It is not unreasonable, in practice, to have a file with 100,000 records,with 10 records stored in each block. If we have one index record per block, the index has 10,000 records. Index records are smaller than data records, so let us assume that 100 index records fit on a block. Thus, our index occupies 100 blocks. Such large indicesare stored as sequential files on disk.

If an index is sufficiently small to be kept in main memory, the search time to find an entry is low. However, if the index is so large that it must be kept on disk, a search for an entry requires several disk block reads. Binary search can be used on the index file to locate an entry, but the search still has a large cost. If the index occupies b blocks, binary search requires as many as [log2(b)] blocks to be read. ([x] denotesthe least integer that is greater than or equal to x; that is, we round upward.) For our 100-block index, binary search requires seven block reads. On a disk system where a block read takes 30 milliseconds, the search will take 210 milliseconds, which is long.

Note that, if overflow blocks have been used, binary search will not be possible. In that case, a sequential search is typically used, and that requires b block reads, which will take even longer. Thus, the process of searching a large index may be costly. To deal with this problem, we treat the index just as we would treat any other sequential file, and construct a sparse index on the primary index, as in Figure .

To locate a record, we first use binary search on the outer index to find the record for the largest search-key value less than or equal to the one that we desire. The pointer points to a block of the inner index. We scan this block until we find the record that has the largest search-key value less than or equal to the one that we desire. The pointer in this record points to the block of the file that contains the record for which we are looking.

Using the two levels of indexing, we have read only one index block, rather than the seven we read with binary search, if we assume that the outer index is already in main memory. If our file is extremely large, even the outer index may grow too large to fit in main memory. In such a case,we can create yet another level of index. Indeed, we can repeat this process as many times as necessary. Indices with two or more levels are called multilevel indices. Searching for records with a multilevel index requires significantly fewer I/O operations than does searching for records by binary search. Each level of index could correspond to a unit of physical storage. Thus, we may have indices at the track, cylinder, and disk levels.

A typical dictionary is an example of a multilevel index in the nondatabase world. The header of each page lists the first word alphabetically on that page. Such a book index is a multilevel index: The words at the top of each page of the book index form a sparse index on the contents of the dictionary pages. Multilevel indices are closely related to tree structures, such as the binary trees used for in-memory indexing.We shall examine the relationship later.

Two-level sparse index.

Two-level sparse index.

Index Update

Regardless of what form of index is used, every index must be updated whenever a record is either inserted into or deleted from the file.We first describe algorithms for updating single-level indices.

  • Insertion. First, the system performs a lookup using the search-key value that appears in the record to be inserted. Again, the actions the system takes next depend on whether the index is dense or sparse:

Dense indices:

  1. If the search-key value does not appear in the index, the system inserts an index record with the search-key value in the index at the appropriate position.
  2. Otherwise the following actions are taken:
  1. If the index record stores pointers to all records with the same search-key value, the system adds a pointer to the new record to the index record.
  2. Otherwise, the index record stores a pointer to only the first record with the search-key value. The system then places the record being inserted after the other records with the same search-key values.

Sparse indices:

We assume that the index stores an entry for each block. If the system creates a new block, it inserts the first search-key value (in search-key order) appearing in the new block into the index. On the otherhand, if the new record has the least search-key value in its block, the system updates the index entry pointing to the block; if not, the system makes no change to the index.

  • Deletion. To delete a record, the system first looks up the record to be deleted. The actions the system takes next depend on whether the index is dense or sparse:

Dense indices:

  1. If the deleted record was the only record with its particular search-key value, then the system deletes the corresponding index record from the index.
  2. Otherwise the following actions are taken:
  1. If the index record stores pointers to all records with the same search-key value, the system deletes the pointer to the deleted record from the index record.
  2. Otherwise, the index record stores a pointer to only the first record with the search-key value. In this case, if the deleted record was the first record with the search-key value, the system updates the index record to point to the next record.

Sparse indices:

  1. If the index does not contain an index recordwith the search-key value of the deleted record, nothing needs to be done to the index.
  2. Otherwise the system takes the following actions:
  1. If the deleted record was the only record with its search key, the system replaces the corresponding index record with an index record for the next search-key value (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.
  2. Otherwise, if the index record for the search-key value points to the record being deleted, the system updates the index record to point to the next record with the same search-key value.

Insertion and deletion algorithms for multilevel indices are a simple extension of the scheme just described. On deletion or insertion, the system updates the lowestlevel index as described. As far as the second level is concerned, the lowest-level index ismerely a file containing records thus, if there is any change in the lowest-level index, the system updates the second-level index as described. The same technique applies to further levels of the index, if there are any.

Secondary Indices

Secondary indices must be dense, with an index entry for every search-key value, and a pointer to every record in the file. Aprimary index may be sparse, storing only some of the search-key values, since it is always possible to find records with intermediate search-key values by a sequential access to a part of the file, as described earlier. If a secondary index stores only some of the search-key values, records with intermediate
search-key values may be anywhere in the file and, in general, we cannot find them without searching the entire file.

A secondary index on a candidate key looks just like a dense primary index, except that the records pointed to by successive values in the index are not stored sequentially. In general, however, secondary indices may have a different structure from primary indices. If the search key of a primary index is not a candidate key, it suffices if the index points to the first record with a particular value for the search key, since the other records can be fetched by a sequential scan of the file.

In contrast, if the search key of a secondary index is not a candidate key, it is not enough to point to just the first record with each search-key value. The remaining records with the same search-key value could be anywhere in the file, since the records are ordered by the search key of the primary index, rather than by the search key of the secondary index. Therefore, a secondary index must contain pointers to all the records.

Secondary Indices

Secondary index on account file, on noncandidate key balance.

We can use an extra level of indirection to implement secondary indices on search keys that are not candidate keys. The pointers in such a secondary index do not point directly to the file. Instead, each points to a bucket that contains pointers to the file. Figure shows the structure of a secondary index that uses an extra level of indirection on the account file, on the search key balance.

A sequential scan in primary index order is efficient because records in the file are stored physically in the same order as the index order. However,we cannot (except in rare special cases) store a file physically ordered both by the search key of the primary index, and the search key of a secondary index. Because secondary-key order and physical-key order differ, if we attempt to scan the file sequentially in secondary-key order, the reading of each record is likely to require the reading of a new block from disk, which is very slow.

The procedure described earlier for deletion and insertion can also be applied to secondary indices; the actions taken are those described for dense indices storing a pointer to every record in the file. If a file has multiple indices, whenever the file is modified, every index must be updated.

Secondary indices improve the performance of queries that use keys other than the search key of the primary index. However, they impose a significant overhead on modification of the database. The designer of a database decides which secondary indices are desirable on the basis of an estimate of the relative frequency of queries and modifications.

B+-Tree Index Files

The main disadvantage of the index-sequential file organization is that performance degrades as the file grows, both for index lookups and for sequential scans through the data. Although this degradation can be remedied by reorganization of the file, frequent reorganizations are undesirable.

B+-Tree Index Files

Typical node of a B+-tree.

The B+-tree index structure is the most widely used of several index structures that maintain their efficiency despite insertion and deletion of data. A B+-tree index takes the form of a balanced tree in which every path from the root of the tree to a leaf of the tree is of the same length. Each nonleaf node in the tree has between [n/2] and n children, where n is fixed for a particular tree.

We shall see that the B+-tree structure imposes performance overhead on insertion and deletion, and adds space overhead. The overhead is acceptable even for frequently modified files, since the cost of file reorganization is avoided. Furthermore, since nodes may be as much as half empty (if they have the minimum number of children), there is some wasted space. This space overhead, too, is acceptable given the performance benefits of the B+-tree structure.

Structure of a B+-Tree

AB+-tree index is amultilevel index, but it has a structure that differs from that of the multilevel index-sequential file. Figure shows a typical node of a B+-tree. It contains up to n − 1 search-key values K1, K2, . . .,Kn−1, and n pointers P1, P2, . . . , Pn. The search-key values within a node are kept in sorted order; thus, if i < j, then Ki < Kj .

We consider first the structure of the leaf nodes. For i = 1, 2, . . . , n − 1, pointer Pi points to either a file record with search-key value Ki or to a bucket of pointers, each of which points to a file record with search-key value Ki. The bucket structure is used only if the search key does not form a primary key, and if the file is not sorted in the search-key value order. Pointer Pn has a special purpose that we shall discuss shortly.

Figure shows one leaf node of a B+-tree for the account file, in which we have chosen n to be 3, and the search key is branch-name. Note that, since the account file is ordered by branch-name, the pointers in the leaf node point directly to the file. Now that we have seen the structure of a leaf node, let us consider how search-key values are assigned to particular nodes. Each leaf can hold up to n − 1 values. We allow leaf nodes to contain as few as [(n−1)/2] values. The ranges of values in each leaf do not overlap. Thus, if Li and Lj are leaf nodes andi < j, then every search key value in Li is less than every search-key value in Lj. If the B+-tree index is to be a dense index, every search-key value must appear in some leaf node.

Now we can explain the use of the pointer Pn. Since there is a linear order on the leaves based on the search-key values that they contain, we use Pn to chain together the leaf nodes in search-key order. This ordering allows for efficient sequential processing of the file.The nonleaf nodes of the B+-tree form a multilevel (sparse) index on the leaf nodes. The structure of nonleaf nodes is the same as that for leaf nodes, except that all pointers are pointers to tree nodes. A nonleaf node may hold up to n pointers, and must hold at least [n/2] pointers. The number of pointers in a node is called the fanout of the node

A leaf node for account B+-tree index (n = 3).

A leaf node for account B+-tree index (n = 3).

Let us consider a node containing m pointers. For i = 2, 3, . . .,m − 1, pointer Pi points to the subtree that contains search-key values less than Ki and greater than or equal to Ki−1. Pointer Pm points to the part of the subtree that contains those key values greater than or equal to Km−1, and pointer P1 points to the part of the subtree that contains those search-key values less than K1.

Unlike other nonleaf nodes, the root node can hold fewer than [n/2] pointers; however, it must hold at least two pointers, unless the tree consists of only one node. It is always possible to construct a B+-tree, for any n, that satisfies the preceding requirements. Figure shows a complete B+-tree for the account file (n = 3). For simplicity, we have omitted both the pointers to the file itself and the null pointers.

As an example of a B+-tree for which the root must have less than [n/2] values, Figure shows a B+-tree for the account file with n = 5. These examples of B+-trees are all balanced. That is, the length of every path from the root to a leaf node is the same. This property is a requirement for a B+-tree. Indeed, the “B” in B+-tree stands for “balanced.” It is the balance property of B+-trees that ensures good performance for lookup, insertion, and deletion.

B+-tree for account file (n = 3).

B+-tree for account file (n = 3).

B+-tree for account file with n = 5.

B+-tree for account file with n = 5.

Queries on B+-Trees

Let us consider how we process queries on a B+-tree. Suppose that we wish to find all records with a search-key value of V. Figure presents pseudocode for doing so. Intuitively, the procedure works as follows. First, we examine the root node, looking for the smallest search-key value greater than V. Suppose that we find that this search-key value is Ki.We then follow pointer Pi to another node. If we find no such value, then k ≥ Km−1, where m is the number of pointers in the node. In this case we follow Pm to another node. In the node we reached above, again we look for the smallest search-key value greater than V, and once again follow the corresponding pointer as above. Eventually, we reach a leaf node. At the leaf node, if we find searchkey value Ki equals V , then pointer Pi directs us to the desired record or bucket. If the value V is not found in the leaf node, no record with key value V exists.

Thus, in processing a query, we traverse a path in the tree from the root to some leaf node. If there are K search-key values in the file, the path is no longer than [log[n/2 (K)]. In practice, only a few nodes need to be accessed, Typically, a node is made to be the same size as a disk block, which is typically 4 kilobytes. With a search-key size of 12 bytes, and a disk-pointer size of 8 bytes, n is around 200. Even with a more conservative estimate of 32 bytes for the search-key size, n is around 100. With n = 100, if we have 1 million search-key values in the file, a lookup requires only

if there is a key value Ki in C such that Ki = V then pointer Pi directs us to the desired record or bucket else no record with key value k exists end procedure [log50(1,000,000)] = 4 nodes to be accessed. Thus, at most four blocks need to be read from disk for the lookup. The root node of the tree is usually heavily accessed and is likely to be in the buffer, so typically only three or fewer blocks need to be read from disk.

Figure Querying a B+-tree.

An important difference between B+-tree structures and in-memory tree structures, such as binary trees, is the size of a node, and as a result, the height of the tree. In a binary tree, each node is small, and has at most two pointers. In a B+-tree, each node is large typically a disk block and a node can have a large number of pointers. Thus, B+-trees tend to be fat and short, unlike thin and tall binary trees. In a balanced binary tree, the path for a lookup can be of length [log2(K)], where K is the number of search-key values. With K = 1,000,000 as in the previous example, a balanced binary tree requires around 20 node accesses. If each node were on a different disk block, 20 block reads would be required to process a lookup, in contrast to the four block reads for the B+-tree.

Updates on B+-Trees

Insertion and deletion aremore complicated than lookup, since it may be necessary to split a node that becomes too large as the result of an insertion, or to coalesce nodes (that is, combine nodes) if a node becomes too small (fewer than [n/2] pointers). Furthermore, when a node is split or a pair of nodes is combined, we must ensure that balance is preserved. To introduce the idea behind insertion and deletion in a B+-tree, we shall assume temporarily that nodes never become too large or too small.

Under this assumption, insertion and deletion are performed as defined next.

  • Insertion. Using the same technique as for lookup, we find the leaf node in which the search-key value would appear. If the search-key value already appears in the leaf node, we add the new record to the file and, if necessary, add to the bucket a pointer to the record. If the search-key value does not appear, we insert the value in the leaf node, and position it such that the search keys are still in order. We then insert the new record in the file and, if necessary, create a new bucket with the appropriate pointer.
  • Deletion. Using the same technique as for lookup, we find the record to be deleted, and remove it from the file.We remove the search-key value from the leaf node if there is no bucket associated with that search-key value or if the bucket becomes empty as a result of the deletion.

We now consider an example in which a node must be split. Assume that we wish to insert a record with a branch-name value of “Clearview” into the B+-tree of Figure. Using the algorithm for lookup, we find that “Clearview” should appear in the node containing “Brighton” and “Downtown.” There is no room to insert the search-key value “Clearview.” Therefore, the node is split into two nodes. Figure shows the two leaf nodes that result from inserting “Clearview” and splitting the node containing “Brighton” and “Downtown.” In general, we take the n search-key values (the n − 1 values in the leaf node plus the value being inserted), and put the first [n/2] in the existing node and the remaining values in a new node.

Split of leaf node on insertion of “Clearview.

Split of leaf node on insertion of “Clearview.”

Having split a leaf node, we must insert the new leaf node into the B+-tree structure. In our example, the new node has “Downtown” as its smallest search-key value. We need to insert this search-key value into the parent of the leaf node that was split. The B+-tree of Figure shows the result of the insertion. The search-key value “Downtown” was inserted into the parent. It was possible to perform this insertion because there was room for an added search-key value. If there were no room, the parent would have had to be split. In the worst case, all nodes along the path to the root must be split. If the root itself is split, the entire tree becomes deeper.

The general technique for insertion into a B+-tree is to determine the leaf node l into which insertion must occur. If a split results, insert the new node into the parent of node l. If this insertion causes a split, proceed recursively up the tree until either an insertion does not cause a split or a new root is created. Figure outlines the insertion algorithm in pseudocode. In the pseudocode, L.Ki and L.Pi denote the ith value and the ith pointer in node L, respectively. The pseudocode also makes use of the function parent(L) to find the parent of a node L.

We can compute a list of nodes in the path from the root to the leaf while initially finding the leaf node, and can use it later to find the parent of any node in the path efficiently. The pseudocode refers to inserting an entry (V,P) into a node. In the case of leaf nodes, the pointer to an entry actually precedes the key value, so the leaf node actually stores P before V . For internal nodes, P is stored just after V . We now consider deletions that cause tree nodes to contain too few pointers. First, let us delete “Downtown” from the B+-tree of Figure . We locate the entry for “Downtown” by using our lookup algorithm. When we delete the entry for “Downtown” from its leaf node, the leaf becomes empty. Since, in our example, n = 3and 0 < [(n−1)/2], this node must be eliminated from the B+-tree. To delete a leaf node, we must delete the pointer to it from its parent. In our example, this deletion leaves the parent node, which formerly contained three pointers, with only two pointers. Since 2 ≥ [n/2], the node is still sufficiently large, and the deletion operation is complete. The resulting B+-tree appears in Figure

Insertion of “Clearview” into the B+-tree of Figure

Insertion of “Clearview” into the B+-tree of Figure .

Figure Insertion of entry in a B+-tree

Figure Insertion of entry in a B+-tree

Deletion of “Downtown” from the B+-tree of Figure .

When we make a deletion from a parent of a leaf node, the parent node itself may become too small. That is exactly what happens if we delete “Perryridge” from the B+-tree of Figure . Deletion of the Perryridge entry causes a leaf node to become empty.When we delete the pointer to this node in the latter’s parent, the parent is left with only one pointer. Since n = 3, [n/2] = 2, and thus only one pointer is too few. However, since the parent node contains useful information, we cannot simply delete it. Instead, we look at the sibling node (the nonleaf node containing the one search key,Mianus). This sibling node has room to accommodate the information contained in our now-too-small node, so we coalesce these nodes, such that the sibling node now contains the keys “Mianus” and “Redwood.” The other node (the node containing only the search key “Redwood”) now contains redundant information and can be deleted from its parent (which happens to be the root in our example). Figure shows the result. Notice that the root has only one child pointer after the deletion, so it is deleted and its sole child becomes the root. So the depth of the B+-tree has been decreased by 1.

It is not always possible to coalesce nodes. As an illustration, delete “Perryridge” from the B+-tree of Figure In this example, the “Downtown” entry is still part of the tree. Once again, the leaf node containing “Perryridge” becomes empty. The parent of the leaf node becomes too small (only one pointer). However, in this example, the sibling node already contains the maximum number of pointers: three. Thus, it cannot accommodate an additional pointer. The solution in this case is to redistribute the pointers such that each sibling has two pointers. The result appears in Figure Note that the redistribution of values necessitates a change of a searchkey value in the parent of the two siblings.

Deletion of “Perryridge” from the B+-tree of Figure.

Deletion of “Perryridge” from the B+-tree of Figure.

Deletion of “Perryridge” from the B+-tree of Figure.

Deletion of “Perryridge” from the B+-tree of Figure.

In general, to delete a value in a B+-tree, we perform a lookup on the value and delete it. If the node is too small, we delete it from its parent. This deletion results in recursive application of the deletion algorithm until the root is reached, a parent remains adequately full after deletion, or redistribution is applied.

Figure outlines the pseudocode for deletion from a B+-tree. The procedure swap variables(L,L') merely swaps the values of the (pointer) variables L and L'; this swap has no effect on the tree itself. The pseudocode uses the condition “too few pointers/values.” For nonleaf nodes, this criterion means less than [n/2] pointers; for leaf nodes, it means less than [(n − 1)/2] values. The pseudocode redistributes entries by borrowing a single entry from an adjacent node. We can also redistribute entries by repartitioning entries equally between the two nodes. The pseudocode refers to deleting an entry (V,P) from a node. In the case of leaf nodes, the pointer to an entry actually precedes the key value, so the pointer P precedes the key value V . For internal nodes, P follows the key value V .

It is worth noting that, as a result of deletion, a key value that is present in an internal node of the B+-tree may not be present at any leaf of the tree. Although insertion and deletion operations on B+-trees are complicated, they require relatively few I/O operations, which is an important benefit since I/O operations are expensive. It can be shown that the number of I/O operations needed for a worst-case insertion or deletion is proportional to log[n/2] (K), where n is the maximum number of pointers in a node, and K is the number of search-key values. In other words, the cost of insertion and deletion operations is proportional to the heightof the B+-tree, and is therefore low. It is the speed of operation on B+-trees that makes them a frequently used index structure in database implementations.

B+-Tree File Organization

As mentioned in , the main drawback of index-sequential file organization is the degradation of performance as the file grows: With growth, an increasing percentage of index records and actual records become out of order, and are stored in overflow blocks.We solve the degradation of index lookups by using B+-tree indices on the file. We solve the degradation problem for storing the actual records by usingthe leaf level of the B+-tree to organize the blocks containing the actual records. We use the B+-tree structure not only as an index, but also as an organizer for records in a file. In a B+-tree file organization, the leaf nodes of the tree store records, instead of storing pointers to records. Figure shows an example of a B+-tree file organization.

Since records are usually larger than pointers, the maximum number of records that can be stored in a leaf node is less than the number of pointers in a nonleaf node. However, the leaf nodes are still required to be at least half full.

Figure: Deletion of entry from a B+-tree.

Insertion and deletion of records from a B+-tree file organization are handled in the same way as insertion and deletion of entries in a B+-tree index. When a record with a given key value v is inserted, the system locates the block that should contain the record by searching the B+-tree for the largest key in the tree that is ≤ v. If the block located has enough free space for the record, the system stores the record in theblock. Otherwise, as in B+-tree insertion, the system splits the block in two, and redistributes the records in it (in the B+-tree–key order) to create space for the new record.

The split propagates up the B+-tree in the normal fashion. When we delete a record, the system first removes it from the block containing it. If a block B becomes less than half full as a result, the records in B are redistributed with the records in an adjacent block B'. Assuming fixed-sized records, each block will hold at least one-half as many records as the maximum that it can hold. The system updates the nonleafnodes of the B+-tree in the usual fashion.

When we use a B+-tree for file organization, space utilization is particularly important, since the space occupied by the records is likely to be much more than the space occupied by keys and pointers.We can improve the utilization of space in a B+- tree by involving more sibling nodes in redistribution during splits and merges. The technique is applicable to both leaf nodes and internal nodes, and works as follows.

During insertion, if a node is full the system attempts to redistribute some of its entries to one of the adjacent nodes, to make space for a new entry. If this attempt fails because the adjacent nodes are themselves full, the system splits the node, and splits the entries evenly among one of the adjacent nodes and the two nodes that it obtained by splitting the original node. Since the three nodes together contain one more recordthan can fit in two nodes, each node will be about two-thirds full. More precisely, each node will have at least [2n/3] entries,where n is the maximum number of entries that the node can hold. ([x] denotes the greatest integer that is less than or equal to x; that is, we drop the fractional part, if any.)

B+-tree file organization

B+-tree file organization.

During deletion of a record, if the occupancy of a node falls below [2n/3], the system attempts to borrow an entry from one of the sibling nodes. If both sibling nodes have [2n/3] records, instead of borrowing an entry, the system redistributes the entries in the node and in the two siblings evenly between two of the nodes, and deletes the third node. We can use this approach because the total number of entries is 3[2n/3]−1,which is less than 2n.With three adjacent nodes used for redistribution, each node can be guaranteed to have [3n/4] entries. In general, if m nodes (m − 1 siblings) are involved in redistribution, each node can be guaranteed to contain at least [(m − 1)n/m] entries. However, the cost of update becomes higher as more sibling nodes are involved in the redistribution.

B-Tree Index Files

B-tree indices are similar to B+-tree indices. The primary distinction between the two approaches is that a B-tree eliminates the redundant storage of search-key values. In the B+-tree of Figure, the search keys “Downtown,” “Mianus,” “Redwood,” and “Perryridge” appear twice. Every search-key value appears in some leaf node; several are repeated in nonleaf nodes.

A B-tree allows search-key values to appear only once. Figure shows a B-tree that represents the same search keys as the B+-tree of Figure . Since search keys are not repeated in the B-tree,wemay be able to store the index in fewer tree nodes than in the corresponding B+-tree index. However, since search keys that appear in nonleaf nodes appear nowhere else in the B-tree, we are forced to include an additional pointer field for each search key in a nonleaf node. These additional pointers point to either file records or buckets for the associated search key.

A generalized B-tree leaf node appears in Figure a; a nonleaf node appears in Figure b. Leaf nodes are the same as in B+-trees. In nonleaf nodes, the pointers Pi are the tree pointers that we used also for B+-trees, while the pointers Bi are bucket or file-record pointers. In the generalized B-tree in the figure, there are n − 1 keys in the leaf node, but there are m − 1 keys in the nonleaf node. This discrepancy occurs because nonleaf nodes must include pointers Bi, thus reducing the number of search keys that can be held in these nodes. Clearly,m < n, but the exact relationship between m and n depends on the relative size of search keys and pointers.

B-tree equivalent of B+-tree in Figure.

B-tree equivalent of B+-tree in Figure.

B-tree equivalent of B -tree in Figure.

Typical nodes of a B-tree. (a) Leaf node. (b) Nonleaf node.

The number of nodes accessed in a lookup in a B-tree depends onwhere the search key is located. A lookup on a B+-tree requires traversal of a path from the root of the tree to some leaf node. In contrast, it is sometimes possible to find the desired value in a B-tree before reaching a leaf node. However, roughly n times as many keys are stored in the leaf level of a B-tree as in the nonleaf levels, and, since n is typicallylarge, the benefit of finding certain values early is relatively small. Moreover, the fact that fewer search keys appear in a nonleaf B-tree node, compared to B+-trees, implies that a B-tree has a smaller fanout and therefore may have depth greater than that of the corresponding B+-tree. Thus, lookup in a B-tree is faster for some search keys but slower for others, although, in general, lookup time is still proportional to thelogarithm of the number of search keys.

Deletion in a B-tree is more complicated. In a B+-tree, the deleted entry always appears in a leaf. In a B-tree, the deleted entry may appear in a nonleaf node. The proper value must be selected as a replacement from the subtree of the node containing the deleted entry. Specifically, if search key Ki is deleted, the smallest search key appearing in the subtree of pointer Pi+1 must be moved to the field formerly occupied by Ki. Further actions need to be taken if the leaf node now has too few entries.

In contrast, insertion in a B-tree is only slightly more complicated than is insertion in a B+-tree. The space advantages of B-trees are marginal for large indices, and usually do not outweigh the disadvantages that we have noted. Thus, many database system implementers prefer the structural simplicity of a B+-tree. The exercises explore details of the insertion and deletion algorithms for B-trees.

Static Hashing

One disadvantage of sequential file organization is that we must access an index structure to locate data, or must use binary search, and that results in more I/O operations. File organizations based on the technique of hashing allow us to avoid accessing an index structure. Hashing also provides a way of constructing indices. We study file organizations and indices based on hashing in the following sections .

Hash File Organization

In a hash file organization, we obtain the address of the disk block containing a desired record directly by computing a function on the search-key value of the record. In our description of hashing, we shall use the term bucket to denote a unit of storage that can store one or more records. A bucket is typically a disk block, but could be chosen to be smaller or larger than a disk block. Formally, let K denote the set of all search-key values, and let B denote the set of all bucket addresses. A hash function h is a function from K to B. Let h denote a hash function.

To insert a record with search key Ki, we compute h(Ki), which gives the address of the bucket for that record. Assume for now that there is space in the bucket to store the record. Then, the record is stored in that bucket. To perform a lookup on a search-key value Ki, we simply compute h(Ki), then search the bucket with that address. Suppose that two search keys, K5 and K7, have the same hash value; that is, h(K5) = h(K7). If we perform a lookup on K5, the bucket h(K5) contains records with search-key values K5 and records with search key values K7. Thus, we have to check the search-key value of every record in thebucket to verify that the record is one that we want.

Deletion is equally straightforward. If the search-key value of the record to be deleted is Ki, we compute h(Ki), then search the corresponding bucket for that record, and delete the record from the bucket.

Hash Functions

The worst possible hash function maps all search-key values to the same bucket. Such a function is undesirable because all the records have to be kept in the same bucket. A lookup has to examine every such record to find the one desired. An ideal hash function distributes the stored keys uniformly across all the buckets, so that every bucket has the same number of records. Since we do not know at design time precisely which search-key values will be stored in the file, we want to choose a hash function that assigns search-key values to buckets in such a way that the distribution has these qualities:

  • The distribution is uniform. That is, the hash function assigns each bucket the same number of search-key values from the set of all possible search-key values.
  • The distribution is random. That is, in the average case, each bucket will have nearly the same number of values assigned to it, regardless of the actual distribution of search-key values. More precisely, the hash value will not be correlated to any externally visible ordering on the search-key values, such as alphabetic ordering or ordering by the length of the search keys; the hash function will appear to be random.

As an illustration of these principles, let us choose a hash function for the account file using the search key branch-name. The hash function that we choose must have the desirable properties not only on the example account file that we have been using, but also on an account file of realistic size for a large bank with many branches.

Assume that we decide to have 26 buckets, and we define a hash function that maps names beginning with the ith letter of the alphabet to the ith bucket. This hash function has the virtue of simplicity, but it fails to provide a uniform distribution, since we expect more branch names to begin with such letters as B and R than Q and X, for example.

Now suppose that we want a hash function on the search key balance. Suppose that the minimum balance is 1 and the maximum balance is 100,000, and we use a hash function that divides the values into 10 ranges, 1–10,000, 10,001–20,000 and so on. The distribution of search-key values is uniform (since each bucket has the same number of different balance values), but is not random. But records with balances between 1and 10,000 are far more common than are records with balances between 90,001 and 100,000. As a result, the distribution of records is not uniform some buckets receive more records than others do. If the function has a random distribution, even if there are such correlations in the search keys, the randomness of the distribution will make it very likely that all buckets will have roughly the same number of records, as long as each search key occurs in only a small fraction of the records. (If a single search key occurs in a large fraction of the records, the bucket containing it is likely to have more records than other buckets, regardless of the hash function used.)

Typical hash functions perform computation on the internal binary machine representation of characters in the search key. A simple hash function of this type first computes the sum of the binary representations of the characters of a key, then returns the sum modulo the number of buckets. Figure shows the application of such a scheme, with 10 buckets, to the account file, under the assumption that the 5th letter in the alphabet is represented by the integer i.

Hash functions require careful design. A bad hash function may result in lookup taking time proportional to the number of search keys in the file. A well-designed function gives an average-case lookup time that is a (small) constant, independent of the number of search keys in the file.

Handling of Bucket Overflows

So far, we have assumed that, when a record is inserted, the bucket to which it is mapped has space to store the record. If the bucket does not have enough space, a bucket overflow is said to occur. Bucket overflow can occur for several reasons:

  • Insufficient buckets. The number of buckets, which we denote nB, must be chosen such that nB > nr/fr, where nr denotes the total number of records that will be stored, and fr denotes the number of records that will fit in a bucket. This designation, of course, assumes that the total number of records is known when the hash function is chosen.
  • Skew. Some buckets are assigned more records than are others, so a bucket may overflow even when other buckets still have space. This situation is called bucket skew. Skew can occur for two reasons.

Hash organization of account file, with branch-n

Hash organization of account file, with branch-n

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

Database system concepts Topics