INSERT / SELECT Command Teradata

Although the INSERT is great for adding a single row not currently present in the system, an INSERT/SELECT is even better when the data already exists within Teradata. In this case, the INSERT is combined with a SELECT. However, no rows are returned to the user. Instead, they go into the table as new rows.

The SELECT reads the data values from the one or more columns in one or more tables and uses them as the values to INSERT into another table. Simply put, the SELECT takes the place of the VALUES portion of the INSERT.

This is a common technique for building data marts, interim tables and temporary tables. It is normally a better and much faster alternative than extracting the rows to a data file, then reading the data file and inserting the rows using a utility.

If the data needed is already in the data warehouse, why reload it? Instead select it and insert it.

The syntax of the INSERT / SELECT follows:

When all columns are desired to make an exact copy of the second table and both tables have the exact same number of columns in the exact same order with the exact same data types; an * may be used in the SELECT to read all columns without a WHERE clause, as in the next example:

INSERT INTO My_table SELECT * FROM
My_original_table;

Like all SELECT operations without a WHERE clause, a full table scan occurs and all the rows of the second table are inserted into My_table, using only the data values from the columns listed.

When fewer than all the columns are desired, either of the following INSERT / SELECT statements will do the job:

In both of the above examples, only the first three and the last columns are receiving data. In the first INSERT, the data is a literal date. The second INSERT uses the CURRENT_DATE. Both are acceptable, depending on what is needed.

Working with the same concept of a normal INSERT, when using the column names the only data values needed are for these columns and they must be in the same sequence as the column list, not the CREATE TABLE. Therefore, omitted data values or column names become a NULL data value.

Since the second part of this INSERT is a SELECT, it can contain any of the functions of a SELECT. It might be an inner or outer join, subquery or correlated subquery. The SELECT also may contain aggregates, literals, substrings or derived data.

Regardless, it is still very important to list the selected data values to match the sequence of the columns in the CREATE TABLE. The columns to be inserted must be listed in a sequence that matches the columns in the SELECT from the original table.

As an example of a data mart, it might be desirable to build a summary table using something like the following:

However used, the INSERT / SELECT is a powerful tool for creating rows from the rows already contained in one or more other tables.

Fast Path INSERT / SELECT

When the table being loaded is empty, the INSERT / SELECT is very fast. This is especially true when all columns and all rows are being copied. Remember, the table being loaded must be empty to attain the speed. If there is even one row already in the table, it negates the ability to take the Fast Path.

There are two reasons behind this speed. First, there is no need to Transient Journal an identifier for each inserted row. Recovery, if needed, is to empty the table. No other type of recovery can be easier or faster.

Second, when all columns and all rows are requested from the existing table and they exactly match the columns in the new table, there is no need to use spool. The rows go straight into the table being loaded. Additionally, when all rows are being selected Teradata does not bother to read the individual rows. Instead, each AMP literally copies the blocks of the original table to blocks for the new table.

These reasons are why it is called the Fast Path. To use this technique, the order of the columns in both tables must match exactly and so must the data types, otherwise spool must be used to rearrange the data values or translate from one data type to the other.

What if it is necessary to retrieve the rows from multiple tables for the INSERT?

Multiple INSERT / SELECT operations could be performed as follows:

The first INSERT/SELECT into My_table loads the empty table extremely fast, even with millions of rows. However, the table is no longer empty and the subsequent INSERT is much slower because it cannot use the fast path. All inserted rows must be identified in the Transient Journal. It can more than double the processing time.

The real question is: How to make all of the individual SELECT operations act as one so that the table stays empty until all rows are available for the INSERT?

One way to do this uses the UNION command to perform all SELECT operations in parallel before the first row is inserted into the new table. Therefore all rows are read from the various tables, combined into a single answer set in spool and then loaded into the empty table. All of this is done at high speed.

For instance, if all the rows from three different tables are needed to populate the new table, the applicable statement might look like the following:

Again, the above statement assumes that all four tables have exactly the same columns. Whether or not that would ever be the case in real life, this is used as an example. However, at this point we know the columns in the SELECT must match the columns in the table to be loaded, no matter how that is accomplished.

A second alternative method is available using BTEQ. The key here is that BTEQ can do multiple SQL statements as a single transaction for the SELECT and the INSERT operations. The only way to do that is to delay the actual INSERT, until all of the rows from all the select operations have completed. Then the INSERT is performed as a part of the same transaction into the empty table.

The BTEQ trick mentioned above is achieved in this manner:

By having another SQL command on the same line as the semi-colon (;), in BTEQ, they all become part of the same multi-statement transaction. Therefore, all are inserting into an empty table and it is much faster than doing each INSERT individually. Now you know thesecret too.

This does not work as fast in Queryman, because it considers each INSERT as a single transaction. Therefore, the table is no longer empty for the second and third transactions after the first INSERT transaction completes. Since it is not an empty table after the first insert, the transient journal is used for all subsequent inserts and they are all much slower due to rollback considerations and use of the Transient Journal.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics