PRIMARY INDEX considerations Teradata

Our examples have had a table level constraint of UNIQUE PRIMARY INDEX (UPI) on the column called emp. You must select a PRIMARY INDEX for a table at TABLE CREATE time or Teradata will choose one for you. There are two types of PRIMARY INDEXES. They are UNIQUE and NON-UNIQUE and are referred to as UPI and NUPI (pronounced ‘you-pea’ and ‘new-pea’). We have seen an example of a UNIQUE PRIMARY INDEX (UPI). Let us show you an example of a NON-UNIQUE PRIMARY INDEX(NUPI).

Teradata also allows for multicolumn Primary Indexes, but only allow up to 16 combined columns max to represent the Primary Index. Here is an example of a multicolumn Primary Index.

The data value stored in the column(s) of the PRIMARY INDEX (PI) is used by Teradata to spread the rows among the AMPs. The Primary Index determines which AMP stores an individual row of a table. The PI data is converted into the Row Hash using a mathematical hashing formula. The result is used as an offset into the Hash Map to determine the AMP number. Since the PI value determines how the data rows are distributed among the AMPs, requesting a row using the PI value is always the most efficient retrieval mechanismfor Teradata.

If you don't specify a PI at table create time then Teradata must chose one. For instance, if the DDL is ported from another database that uses a Primary Key instead of a Primary Index, the CREATE TABLE contains a PRIMARY KEY (PK) constraint. Teradata is smart enough to know that Primary Keys must be unique and cannot be null. So, the first level of default is to use the PRIMARY KEY column(s) as a UPI. If the DDL defines no PRIMARY KEY, Teradata looks for a column defined asUNIQUE. As a second level default, Teradata uses the first column defined with a UNIQUE constraintas a UPI.

If none of the above attributes are found, Teradata uses the first column defined in the table as a NON-UNIQUE PRIMARY INDEX (NUPI).

The next CREATE TABLE statement builds a table definition for a table called employee, but does not define a Primary Index. Which column do you think it will choose?

Since there is no PI listed, Teradata must chose one. The request does not define a PK nor is there is a UNIQUE constraint. As a result, Teradata utilizes the first column (emp) as a NUPI. We suggest you always name the PI specifically in the DDL. That way there is no confusion about what column(s) are intended to be the primary index.

Table Type Specifications of SET VS MULTISET

There are two different table type philosophies so there are two different type tables. They are SET and MULTISET. It has been said, "A man with one watch knows the time, but a man with two watches is never sure". When Teradata was originally designed it did not allow duplicate rows in a table. If any row in the same table had the same values in every column Teradata would throw one of the rows out. They believed a second row was a mistake. Why would someone need two watches and why would someone need two rows exactly the same? This is SET theory and a SET table kicks out duplicate rows.
The ANSI standard believed in a different philosophy. If two rows are entered into a table that are exact duplicates then this is acceptable. If a person wants to wear two watches then they probably have a good reason. This is a MULTISET table and duplicate rows are allowed. If you do not specify SET or MULTISET, one is used as a default. Here is the issue: the default in Teradata mode is SET and the default in ANSI mode is MULTISET.

Therefore, to eliminate confusion it is important to explicitly define which one is desired. Otherwise, you must know in which mode the CREATE TABLE will execute in so that the correct type is used for each table. The implication of using a SET or MULTISET table is discussed further.

SET and MULTISET Tables

A SET table does not allow duplicate rows so Teradata checks to ensure that no two rows in a table are exactly the same. This can be a burden. One way around the duplicate row check is to have a column in the table defined as UNIQUE. This could be a Unique Primary Index (UPI), Unique Secondary Index (USI) or even a column with a UNIQUE or PRIMARY KEY constraint. Since all must be unique, a duplicate row may never exist. Therefore, the check on either the index or constraint eliminates the need for the row to be examined for uniqueness. As a result, inserting new rows can be much faster by eliminating the duplicate row check.

However, if the table is defined with a NUPI and the table uses SET as the table type, now a duplicate row check must be performed. Since SET tables do not allow duplicate rows a check must be performed every time a NUPI DUP (duplicate of an existing row NUPI value) value is inserted or updated in the table. Do not be fooled! A duplicate row check can be a very expensive operation in terms of processing time. This is because every new row inserted must be checked to see if it is a duplicate of any existing row with the same NUPI Row Hash value. The number of checks increases exponentially as each new row is added to the table.

What is the solution? There are two: either make the table a MULTISET table (only if you want duplicate rows to be possible) or define at least one column or composite columns as UNIQUE. If neither is an option then the SET table with no unique columns will work, but inserts and updates will take more time because of the mandatory duplicate row check.

Below is an example of creating a SET table:

Notice the UNIQUE PRIMARY INDEX on the column emp. Because this is a SET table it is much more efficient to have at least one unique key so the duplicate row check is eliminated.

The following is an example of creating the same table as before, but this time as aMULTISET table:

Notice also that the PI is now a NUPI because it does not use the word UNIQUE. This is important! As mentioned previously, if the UPI is requested, no duplicate rows can be inserted. Therefore, it acts more like a SET table. This MULTISET example allows duplicate rows. Inserts will take longer because of the mandatory duplicate row check.


Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics