Adding Residual Conditions to a Join Teradata

Most of the examples in this book have included all rows from the tables being joined. However, in the world of Teradata with millions of rows being stored in a single table, additional comparisons are probably needed to reduce the number of rows returned. There are two ways to code residual conditions. They are: the use of a compound condition using the ON, or a WHERE clause may be used in the new JOIN. These residual conditions are in addition to the join equality in the ON clause.

Consideration should be given to the type of join when including the WHERE clause. The following paragraphs discuss the operational aspects of mixing an ON with a WHERE for INNER and OUTER JOIN operations.

INNER JOIN

The WHERE clause works exactly the same when used with the INNER JOIN as it does on all other forms of the SELECT. It eliminates rows at read time based on the condition being checked and any index columns involved in the comparison.

Normally, as fewer rows are read, the faster the SQL will run. It is more efficient because fewer resources such as disk, I/O, cache space, spool space, and CPU are needed. Therefore, whenever possible, it is best to eliminate unneeded rows using a WHERE condition with an INNER JOIN. I like the use of WHERE because all residual conditions are located in one place.

The following samples are the same join that was performed earlier in this chapter. Here, one uses a WHERE clause and the other a compound comparison via the ON:


2 Rows Returned INNER JOIN

The output is exactly the same with both coding methods. This can be verified using the EXPLAIN. We recommend using the WHERE clause with an inner join because it consolidates all residual conditions in a single location that is easy to find when changes are needed. Although there are multiple ON comparisons, there is only one WHERE clause.

OUTER JOIN

Like the INNER JOIN, the WHERE clause can also be used with the OUTER JOIN. However, its processing is the opposite of the technique used with an INNER JOIN and other SQL constructs. If you remember, with the INNER JOIN the intent of the WHERE clause was to eliminate rows from one or all tables referenced by the SELECT.

When the WHERE clause is coded with an OUTER JOIN, it is executed last, instead of first. Remember, the OUTER JOIN returns exceptions. The exceptions must be determined using the join (matching and non-matching rows) and therefore rows cannot be eliminated at read time. Instead, they go into the join and into spool. Then, just before the rows are returned to the client, the WHERE checks to see if rows can be eliminated from the spooled join rows.

The following demonstrates the difference when using the same two techniques in the OUTER JOIN. Notice that the results are different:


7 Rows ReturnedOUTER JOIN
OUTER JOIN

Notice that only courses with SQL as part of the name are returned. Whereas the next SELECT using the same condition as a compound comparison has a different result:

11 Rows Returned SELECT using the same condition as a compound comparison has a different result
SELECT using the same condition as a compound comparison has a different result

The reason for the difference makes sense after you think about the functionality of the OUTER JOIN. Remember that an OUTER JOIN retains all rows from the outer table, those that match and those that do not match the ON comparison. Therefore, the row shows up, but as a non-matching row instead of as a matching row.

There is one last consideration when using a WHERE clause with an OUTER JOIN. Always use columns from the outer table in the WHERE. The reason: if columns of the inner table are referenced in a WHERE, the optimizer will perform an INNER JOIN and not an OUTER JOIN, as coded. It does this since no rows will be returned except those of the inner table. Therefore, an INNER JOIN is more efficient. The phrase "merge join" can found be in the EXPLAIN output instead of "outer join" to verify this event.

The next SELECT was executed earlier as an inner join and returned 2 rows. Here it has been converted to an outer join. However, the output from the EXPLAIN shows in step 5 that an inner (merge) join will be used because customer name is a column from the inner table (Customer table):

EXPLAIN

Explanation

  1. First, we lock a distinct MIKEL."pseudo table" for read on a RowHash to prevent global deadlock for MIKEL.cust.
  2. Next, we lock a distinct MIKEL."pseudo table" for read on a RowHash to prevent global deadlock for MIKEL.ord.
  3. We lock MIKEL.cust for read, and we lock MIKEL.ord for read.
  4. We do an all-AMPs RETRIEVE step from MIKEL.ord by way of an all-rows scan with no residual conditions into Spool 2, which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 4 rows. The estimated time for this step is 0.03 seconds.
  5. We do an all-AMPs JOIN step from MIKEL.cust by way of a RowHash match scan with a condition of ("MIKEL.cust.Customer_name LIKE ‘Billy%’"), which is joined to Spool 2 (Last Use). MIKEL.cust and Spool 2 are joined using a merge joinwith a join condition of ( "MIKEL.cust.Customer_number = Customer_number"). The result goes into Spool 1, which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1. The size of Spool 1 is estimated with no confidence to be 4 rows. The estimated time for this step is 0.18 seconds.
  6. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.22 seconds.


Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics