Product Join Teradata

It is very important to use an equal condition in the WHERE clause. Otherwise you get a product join. This means that one row of a table is joined to multiple rows of another table. A mathematic product means that multiplication is used.

The next join example uses a WHERE clause, but it only limits which rows participate in the join and does not provide a join condition:


5 Rows Returned

product joinproduct join

The above output resulted from 1 row in the customer table being joined to all the rows of the order table. The WHERE limited the customer rows that participated in the join, but did not specify an equal comparison between the join columns. As a result, it looks like Billy placed five orders, which is not correct. So, be careful when using product joins because SQL answers the question as asked, not necessarily as intended.

When all rows of one table are joined to all rows of another table, it is called a Cartesian product join or an unconstrained product join. Think about this: if one table has one million rows and the other table contains one thousand rows, the output is one trillion rows (1,000,000 rows * 1,000 rows = 1,000,000,000 rows).

As seen above, the vast majority of the time, a product join has no meaningful output and is usually a mistake. The mistake is either that the WHERE clause is omitted, a column comparison is omitted for one of the tables using an AND, or the table is given an alias and the alias is not used (system thought it was an additional table without a comparison).

The next SELECT is the same as the one above, except this time the entire WHERE clause has been commented out using /* and */:

Since the join condition is converted into a comment, the output from the SELECT is a Cartesian product that will return 980 rows (10*70*14=980) using these very small tables. The output is completely meaningless and implies that every student is taking every class. This output does not reflect the correct situation.

Forgetting to include the WHERE clause does not make the join syntax incorrect. Instead, it results in a Cartesian product join. Always use the EXPLAIN to verify that the result of the join is reasonable before executing the actual join. The following shows the output from an EXPLAIN of the above classic Cartesian product join. Notice that steps 6 and 7 indicate a product join on the condition that (1=1). Since 1 is always equal to 1 every time a row is read, all rows are joined with all rows.
Explanation

  1. First, we lock a distinct MIKEL."pseudo table" for read on a RowHash to prevent global deadlock for MIKEL.SC.
  2. Next, we lock a distinct MIKEL."pseudo table" for read on a RowHash to prevent global deadlock for MIKEL.C.
  3. We lock a distinct MIKEL."pseudo table" for read on a RowHash to prevent global deadlock for MIKEL.S.
  4. We lock MIKEL.SC for read, we lock MIKEL.C for read, and we lock MIKEL.S for read.
  5. We do an all-AMPs RETRIEVE step from MIKEL.C by way of an all-rows scan with no residual conditions into Spool 2, which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 28 rows. The estimated time for this step is 0.15 seconds.
  6. We do an all-AMPs JOIN step from MIKEL.S by way of an all-rows scan with no residual conditions, which is joined to Spool 2 (Last Use). MIKEL.S and Spool 2 are joined using a product join, with a join condition of ("(1=1)"). The result goes into Spool 3, which is duplicated on all AMPs. The size of Spool 3 is estimated with high confidence to be 280 rows. The estimated time for this step is 0.20 seconds.
  7. We do an all-AMPs JOIN step from MIKEL.SC by way of an all-rows scan with no residual conditions, which is joined to Spool 3 (Last Use). MIKEL.SC and Spool 3 are joined using a product join, with a join condition of ("(1=1)"). The result goes into Spool 1, which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1. The size of Spool 1 is estimated with high confidence to be 980 rows. The estimated time for this step is 0.21 seconds.
  8. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.56 seconds.

If you remember from, the EXPLAIN shows immediately that this situation will occur if the SELECT is executed. This is better than waiting, potentially hours, to determine that the SELECT is running too long, stealing valuable computer cycles, doing data transfer, and interfering with valid SQL from other users. Be a good corporate citizen and database user: EXPLAIN your join syntax before executing! Make sure the estimates are reasonable for the size of the database tables involved.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics