Subquery Teradata

The subquery is a commonly used technique and powerful way to select rows from one table based on values in another table. It is predicated on the use of a SELECT statement within a SELECT and takes advantage of the relationships built into a relational database. The basic concept behind a subquery is that it retrieves a list of values that are used for comparison against one or more columns in the main query. To accomplish the comparison, the subquery is written after the WHERE clause and normally as part of an IN list.

In an earlier, the IN was used to build a value list for comparison against the rows of a table to determine which rows to select. The next example illustrates how this technique can be used to SELECT all the columns for rows containing any of the three different values 10, 20 and 30:

4 Rows ReturnedSubquery

As powerful as this is, what if the three values turned into a thousand values. That is too much work and too many opportunities to forget one of the values. Instead of writing the values manually, a subquery can be used to generate the values automatically.

The coding technique of a subquery replaces the values previously written in the IN list with a valid SELECT. Then the subquery SELECT dynamically generates the value list. Once the values have been retrieved, it eliminates the duplicates by automatically performing a DISTINCT.

The following is the syntax for a subquery:

Conceptually, the subquery is processed first so that all the values are expanded into the list for comparison with the column specified in the WHERE clause. These values in the subquery SELECT can only be used for comparison against the column or columns referenced in the WHERE.

Columns inside the subquery SELECT cannot be returned to the user via the main SELECT. The only columns available to the client are those in the tables named in the main (first) FROM clause. The query in parentheses is called the subquery and it is responsible for building the IN list.

At the writing of this document, Teradata allows up to 64 tables in a single query. Therefore, if each SELECT accessed only one table, a query might contain 63 subqueries in a single statement.

The next two tables are used to demonstrate the functionality of sub queries:

two tables are used to demonstrate the functionality of sub queries

order table

The next SELECT uses a subquery to find all customers that have an order of more than $10,000.00:

3 Rows ReturnedSELECT uses a subquery to find all customers

This is an appropriate place to mention that the columns being compared between the main and subqueries must be from the same domain. Otherwise, if no equal condition exists, no rows are returned. The above SELECT uses the customer number (FK) in the Order table to match the customer number (PK) in the Customer table. They are both customer numbers and therefore have the opportunity to compare equal from both tables.

The next subquery swaps the queries to find all the orders by a specific customer:

3 Rows Returnedsubquery swaps the queries to find all the orders by a specific customer

Notice that the Customer table is used in the main query to answer a customer question and the Order table is used in the main query to answer an order question. However, they both compare on the customer number as the common domain between the two tables.

Both of the previous subqueries work fine for comparing a single column in the main table to a value list in the subquery. Thus, it is possible to answer questions like, "Which customer has placed the largest order?" However, it cannot answer this question, "What is the maximum order for each customer?"

To make Subqueries more sophisticated and powerful, they can compare more than one column at a time. The multiple columns are referenced in the WHERE clause, of the main query and also enclosed in parentheses.

The key is this: if multiple columns are named before the IN portion of the WHERE clause, the exact same number of columns must be referenced in the SELECT of the subquery to obtain all the required values for comparison.

Furthermore, the corresponding columns (outside and inside the subquery) must respectively be of the same domain. Each of the columns must be equal to a corresponding value in order for the row to be returned. It works like an AND comparison.

The following SELECT uses a subquery to match two columns with two values in the subquery to find the highest dollar orders for each customer:

4 Rows Returnedsubquery to find the highest dollar orders for each customer

Although this works well for MIN and MAX type of values (equalities), it does not work well for finding values greater than or less than an average. For this type of processing, a Correlated subquery is the best solution and will be demonstrated later in this chapter.

Since 64 tables can be in a single Teradata SQL statement, as mentioned previously, this means that a maximum of 63 subqueries can be written into a single statement. The following shows a 3-table access using two separate subqueries. Additional subqueries simply follow the same pattern.

From the above tables, it is also possible to find the customer who has ordered the single highest dollar amount order. To accomplish this, the Order table must be used to determine the maximum order. Then, the Order table is used again to compare the maximum with each order and finally, compared to the Customer Table to determine which customer placed the order.

The next subquery can be used to find them:

1 Row Returnednext subquery

It is now known that XYZ Plumbing has the highest dollar order. What is not known is the amount of the order. Since the order total is in the Order table, which is not referenced in the main query, it cannot be part of the SELECT list.

Using NOT IN

As seen in a previous chapter, when using the IN and a value list, the NOT IN can be used to find all of the rows that do not match. Using this technique, the subquery above can be modified to find the customers without an order. The only changes made are to 1) add the NOT before the IN and 2) eliminate the WHERE clause in the subquery:

1 Row ReturnedUsing NOT IN

Caution needs to be used regarding the NOT IN when there is a potential for including a NULL in the value list. Since the comparison of a NULL and any other value is unknown, and the NOT of an unknown is still an unknown no rows are returned. Therefore when there is potential for a NULL in the subquery, it is best to also code a compound comparison as seen in the following SELECT:

Using Quantifiers

In other RDBMS systems and early Teradata versions, using an equality symbol (=) in a comparison normally proved to be more efficient than using an IN list. The reason was that it allowed for indices, if they existed, to be used instead of a sequential read of all rows. Today, Teradata automatically uses indices whenever they are more efficient. So, the use of quantifiers is optional and an IN works exactly the same.

Another powerful use for quantifiers is when using inequalities. It is sometimes necessary to find all rows that are greater than or less than one or more other values.

To use quantifiers, replace the IN with an =, <, >, ANY, SOME or ALL as demonstrated in the following syntax:

Earlier in this chapter, a two level subquery was used to find the customer who spent the most money on a single order. It used an IN list to find equal values. The next SELECT uses = ANY to find the same customers:

2 Rows ReturnedUsing Quantifiers

In order to accomplish this, the Order table is first used to determine the average order amount. Then, the Order table is used again to compare the average with each order and finally, compared to the Customer table to determine which of the customers qualify.

The quantifiers of SOME and ANY are interchangeable. However, the use of ANY conforms to ANSI standard and SOME is the Teradata extension. The = ANY is functionally equivalent to using an IN list.

The ALL and the = are more limited in their scope. In order for them to work, there can only be a single value from the subquery for each of the values in the WHERE clause. However, earlier the NOT IN was explored. When using quantifiers and the NOT, consider the following:

equivalency chart

Of these, the NOT = ALL takes the most thought. It forces the system to examine every value in the list to make sure that the value being compared is checked against all the values. Otherwise, as soon as any of the values is different, the row is returned without looking at the other values (ALL).
Although the above describes the conceptual approach of a subquery, the Teradata optimizer will normally use a join to optimize and locate the rows that are needed from within the database.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

Teradata Topics