Correlated Subquery Processing Teradata

The correlated subquery is a very powerful tool. It is an excellent technique to use when there is a need to determine which rows to SELECT based on one or more values from another table. This is especially true when the value for comparison is based on an aggregate. It combines subquery processing and join processing into a single request.

For example, one Teradata user has the need to bill their customers and incorporate the latest payment date. Therefore, the latest date needs to be obtained from the table. So, the payment date is found using the MAX aggregate in the subquery. However, it must be the latest payment date for that customer, which might be different for each customer. The processing involves the subquery locating the maximum date only for one customer account.

The correlated subquery is perfect for this processing. It is more efficient and faster than using a normal subquery with multiple values. One reason for its speed is that it can perform some processing steps in parallel, as seen in an EXPLAIN. The other reason is that it only finds the maximum date when a particular account is read for processing, not for all accounts like a normal subquery.

The operation for a correlated subquery differs from that of a normal subquery. Instead of comparing the selected subquery values against all the rows in the main query, the correlated subquery works backward. It first reads a row in the main query, and then goes into the subquery to find all the rows that match the specified column value. Then, it gets the next row in the main query and retrieves all the subquery rows that match the next value in this row. This processing continues until all the qualifying rows from the main SELECT are satisfied.

Although this sounds terribly inefficient and is inefficient on other databases, it is extremely efficient in Teradata. This is due to the way the AMPs handle this type of request. The AMPs are smart enough to remember and share each value that is located.

Thus, when a second row comes into the comparison that contains the same value as an earlier row, there is no need to re-read the matching rows again. That operation has already been done once and the AMPs remember the answer from the first comparison.

The following is the syntax for writing a correlated subquery:

The subquery does not have a semi-colon of its own. The SELECT in the subquery is all part of the same primary query and shares the one semi-colon.

The aggregate value is normally obtained using MIN, MAX or AVG. Then this aggregate value is in turn used to locate the row or rows within a table that compares equals, less than or greater than this value.

This table is used to demonstrate correlated sub queries:

demonstrate correlated sub queries

Using the above table, this Correlated subquery finds the highest paid employee in each department:

5 Rows ReturnedCorrelated subquery finds the highest paid employee in each department

Notice that both of the tables have been assigned alias names (emp for the main query and emt for the correlated subquery). Because the same Employee table is used in the main query and the subquery, one of them must be assigned an alias. The aliases are used in the subquery to qualify and match the common domain values between the two tables. This coding technique "correlates" the main query table to the one in the subquery.

The following Correlated subquery uses the AVG function to find all employees who earn less than the average pay in their department:

5 Rows ReturnedCorrelated subquery uses the AVG function to find all employees

Earlier in this chapter, it was indicated that a column from the subquery cannot be referenced in the main query. This is still true. However, nothing is wrong with using one or more column references from the main query within the subquery to create a Correlated subquery.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

Teradata Topics