Using Secondary Indices Teradata

The Primary index is always done at table create time. The Primary Index is the mechanism used to distribute the table rows evenly across the AMPs. Every table in Teradata must have one and only one Primary Index and it is the fastest way to retrieve data. A table can also have up to 32 secondary indices. All indices can be single column or multiple columns. A multi-column secondary index can be any combination of up to sixteen columns.

A secondary index becomes an alternate read path to the data. They can be an excellent way to speed up queries. A secondary index can be defined as a Unique Secondary Index (USI) or Non-Unique Secondary Index (NUSI).

The following is the syntax for creating a secondary index:

The example below does not specify UNIQUE and therefore creates a non-unique secondary index on the dept column of the table Employee.

CREATE INDEX(dept) on TomC.Employee;

The next example creates a unique secondary index(USI) on the combination of first and last names with an index name of name_idx_1in the Employee table.

CREATE UNIQUE INDEX name_idx_1 (fname, lname) on TomC.Employee;

Why would someone choose to name an index? It is easier to drop if it is a multi-column index. You would just use the syntax:

DROP INDEX name_idx_1 on TomC.Employee

The next example creates a Non-unique secondary index (NUSI) on the last name and assigns a name of name_idx_2 for it:

CREATE INDEX name_idx_2 (lname) on TomC.Employee;

When initially creating an USI, the rows of the table must all be read using a full table scan. During this process, if a duplicate value is encountered, the CREATE INDEX command fails. All duplicate values must be eliminated before an USI can be created.

The table Employee now has three secondary indices. A non-unique index on the column dept, a unique index on the fname and lname combination, and a non-unique index that is named name_idx on lname.

Remember, we suggest that you name any index that uses more than one column. If you wish to name an index with one column, the suggestion is to use a name that is shorter than the column name.

You can also drop an index that is not named. The following drops the index on a multicolumn key in the Employee table:

Drop index (lname,fname) on TomC.Employee;

A classical secondary index is itself a table made up of rows having two main parts. The first is the data column itself inside the secondary index table, and the second part is a pointer showing the locations of the row in the base table. Because Teradata is a parallel architecture, it requires a different means for distributing and retrieving its data – hashing.

Teradata has a very clever way of utilizing unique secondary indices. When a secondary index command is entered Teradata hashes the secondary index column value for each row and place the hash in a secondary index subtable along with the ROW-ID that points the base row where the desired value resides. This approach allows for all USI requests in the WHERE clause of SQL to become two-AMP operations. A NUSI used in the WHERE clause still requires all AMPs, but the AMPs can easily check the secondary index subtable to see if they have one or more qualifying rows.

Teradata creates a different secondary index subtable for each secondary index placed on a table. The price you pay is disk space and overhead. The disk space comes from PERM for the secondary index subtables and there is overhead associated with keeping them current when a data row is changed or inserted. There are three values stored in every secondary index subtable row. They are:

  • Secondary Index data value
  • Secondary Index Row-ID (This is the hashed version of the value)
  • Primary Index Row-ID (This locates the AMP and the row of the base row)

Hashing the secondary index value and storing it in the secondary subtable is a fast and efficient way to gain an alternate path to the data. This is extremely efficient unless the query asks for a range of values to be selected. For example, many queries involving dates ask for a range based on a start and end date using the WHERE clause. The query might try to find all orders where the order_date is between December 25, 2001 and December 31, 2001. A secondary index done on a DATE field does not allow for a range unless it is VALUEORDEREDinstead of HASH-ORDERED.

We suggest that all DATE columns, where you want a secondary index for range queries, should specify the secondary indices to be VALUE ORDERED. A value ordered NUSI stores the subtable rows in sequence by the data value and not by the row hash value. Therefore, range checks work very well. The golden rule for Value-Ordered secondary indices is that they can only be performed on a single column NUSI that is four-bytes or less. Valid data types supported are:

  • DATE

Here is an example of creating a value-ordered secondary index on Hire_date.

CREATE INDEX (Hire_date) ORDER BY VALUES on TomC.Employee;

There is no such thing as a value ordered USI. So, do not use the UNIQUE keyword when attempting to create this type of index. There are only Value-Ordered NUSI's. If statistics are not collected for a NUSI column the optimizer will never use the NUSI when referenced in a WHERE clause. Also, a composite NUSI (multi-column) will almost never be used, even with statistics. Instead, it is usually better to make multiple NUSI indices and collect statistics on all of them. This allows Teradata to use what is called Bitmap Set Manipulation (BMSMS in the EXPLAIN output). This uses the selectivity of multiple NUSIs together to make them highly selective and therefore used for row retrieval. .

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

Teradata Topics