Joining - Firebird

Joining is used in SELECT statements to generate denormalized sets containing columns from multiple tables that store related data. The sets of columns extracted from each table are known as streams. The joining process merges the selected columns into a single output set. The simplest case is a two-table join, where two tables are linked by matching elements of a key that occurs in both tables.

The key columns for making the linkage are chosen on the basis that they implement the relationship unambiguously —that is, a key in the left-side table of the join will find only the rows in the right-side table that belong to the relationship, and it will find all of them. Typically, the linkage is made between a unique key in the left-side table (primary key or another unique key) and a formal or implied foreign key in the right-side table.

However, join conditions are not restricted to primary and foreign key columns, and the engine does not refuse to output duplicate rows. Duplicate rows may cause undesirable results. Please refer to the note on this subject in the “Optimization Topic” section.

The Inner Join

The following statement joins two tables that are related by a foreign key, FK, on the right-side table (Table2) that links to the primary key (PK) of Table1:

This is a specification for an inner join. We will examine the outer join presently. Figure shows the two streams as they exist in the tables and the set that is generated.

Inner join

Inner join

As the diagram indicates, the inner join merges the two streams so that any non-matching rows in either stream are discarded. Another name for the inner join is exclusive join, because its rules stipulate that non-matching pairs in both streams be excluded from the output.

The SQL standards describe two syntaxes for the inner join. The preceding example uses the more modern SQL-92 syntax, distinguished from the older, more limited SQL-89 as explicit join, because it uses an explicit JOIN clause to specify the join conditions.

SQL-89 Implicit INNER JOIN Syntax

Under SQL-89, the tables participating in the join are listed as a comma-separated list in the FROM clause of a SELECT query. The conditions for linking tables are specified among the search conditions in the WHERE clause. There is no special syntax to indicate which conditions are for searching and which are for joining. The join conditions are assumed to be self-evident. In hindsight, with the introduction of the JOIN clause, the old syntax came to be called the implicit joinsyntax.

The implicit join syntax can implement only the inner join—SQL implementations that do not support a JOIN clause cannot do outer joins.

Here is the previous example rewritten as an implicit join:

The implicit join is supported in Firebird for compatibility with legacy application code. It is not recommended for new work because it is inconsistent with the syntaxes for other styles of joins, making maintenance and self -documentation unnecessarily awkward. Some data access software, including drivers, may not handle the SQL-89 syntax well because of parsing problems distinguishing join and search conditions. It can be anticipated that it will be dropped from future standards.

SQL-92 Explicit INNER JOIN Syntax

The explicit inner join is preferred for Firebird and other RDBMSs that support it. If the optimizer is allowed to compute the query plan, there is no reason the SQL-92 syntax would perform better or worse than the older syntax, since the DSQL interpreter translates either statement into an identical binary language form for analysis by the optimizer.

Explicit joining makes statement code more readable and consistent with other styles of joins supported by SQL-92 and subsequent standards. It is sometimes referred to as a conditional join syntax because the JOIN ... ON clause structure enables the join conditions to be distinguished from search conditions. Not surprisingly, this usage of the term “conditional” can be confusing!

The keyword INNER is entirely optional and is usually omitted. Alone, JOIN has exactly the same meaning as INNER JOIN. (If JOIN is preceded by LEFT, RIGHT, or FULL, then it is not an inner join.)

Three or More Streams

If there are more than two streams (tables), just continue to add JOIN ... ON clauses for each relationship. The following example adds a third stream to the previous example, joining it to the second stream through another foreign key relationship:

Multi-column Key Linkages

If a single relationship is linked by more than one column, use the keyword AND to separate each join condition, just as you would do in a WHERE clause with multiple conditions. Take, for example, a table called TableA with a primary key (PK1, PK2) being linked to by TableB through a foreign key (FK1, FK2):

Mixing Implicit and Explicit Syntaxes

Writing statements that include a mixture of implicit and explicit syntaxes is illegal in Firebird 1.5 and allowable (but discouraged) in Firebird 1.0.x. The following is an example of how not to write a join statement:

Outer Joins

In contrast to the inner join, an outer join operation outputs rows from participating tables, even if no match is found in some cases. Wherever a complete matching row cannot be formed from the join, the “missing” data items are output as NULL. Another term for an outer join is inclusive join.

Each outer join has a left and right side, the left side being the stream that appears to the left of the JOIN clause, and the right side being the stream that is named as the argument of the JOIN clause. In a statement that has multiple joins, “leftness” and “rightness” may be relative—the right stream of one join clause may be the left stream of another, typically where the join specifications are “flattening” a hierarchical structure.

“Leftness” and “rightness” are significant to the logic of outer join specifications. Outer joins can be left, right, or full. Each type of outer join produces a different output set from the same input streams. The keywords LEFT, RIGHT, and FULL are sufficient to establish that the joins are “outer” and the keyword OUTER is an optional part of the syntax.

LEFT OUTER JOIN

A LEFT OUTER JOIN causes the query to output a set consisting of fully populated columns where matching rows are found (as in the inner join) and also partly populated rows for each instance where a right-side match is not found for the left-side key. The unmatchable columns are “filled” with NULL. Here is a statement using the same input streams as our INNER JOIN example:

Figure depicts how the streams are merged in a left join.

Left join

Left join

This time, instead of discarding the left stream rows for which there is no match in the right stream, the query creates a row containing the data from the left stream and NULL in each of the specified right-stream columns.

RIGHT OUTER JOIN

A RIGHT OUTER JOIN causes the query to output a set consisting of fully populated columns where matching rows were found (as in the inner join) and also partly populated rows for each instance where a right-side row exists with no corresponding set in the left-side stream. The unmatchable columns are “filled” with NULL. Here is a statement using the same input streams as our INNER JOIN example. The optional keyword OUTER is omitted here.

Figure depicts how the streams are merged in a right join.

Right join

Right join

FULL OUTER JOIN

The FULL OUTER JOIN is fully inclusive. It returns one row for each pair of matching streams and one partly populated row for each row in each stream where matches are not found. It combines the behaviors of the right and left joins. Here is the statement, using the same input streams as our INNER JOIN example:

Figure depicts how the streams are merged in a full join.

Full join

Full join

Cross Joins

Firebird does not provide language token support for the CROSS JOIN syntax, which produces an output set that is the Cartesian product of two tables. That is, for each row in the left stream, an output stream will be generated for each row in the right stream. In the unlikely event that you need a Cartesian product, you can use the SQL-89 syntax without any joining criteria in the WHERE clause, for example:

SELECT T1.*, T2.* FROM T1 TABLE1, T2 TABLE2;

The Firebird query engine sometimes forms cross joins internally, when constructing “rivers” during joining operations (see the “Optimization Topic” section near the end of this chapter).

Natural Joins

Firebird does not support the NATURAL JOIN (also known as EQUIJOIN), which forms sets by linking two streams on the basis of matching columns that share common column identifiers that have equal values. In Firebird, you always need to cite the join conditions.

Ambiguity in JOIN Queries

Various database theory texts will tell you that ambiguity can exist only where the same column names occur in multiple streams. The practitioner may have a different story to tell. The question of sorting out ambiguities gets down to the way database engines implement the parsing, streaming, and sorting that goes on during a join operation.

InterBase was lenient about the use of disambiguating syntax in joins, with sometimes unfortunate results. If you are migrating your existing application code from an InterBase heritage, do not feel too bad about the SQL exceptions that Firebird throws up during the first test run through queries with joins. It is showing you places in your code where, in the past, you have been allowed to submit queries that could produce wrong output.

Firebird will not accept JOIN queries containing column references that do not have full, consistent table qualification. Table qualification can be by table identifier or by table aliasing. From version 1.5 onward, it cannot be a mixture of each. If you are using version 1.0.x, take care to be consistent if you wish to avoid recoding when upgrading.

The preceding examples used the table identifier. Table aliasing is more elegant, more compact and, for certain queries (see the section “Re-entrant Joins”) it is mandatory.

Table Aliases

When the name of the table is long or complicated, or there are many tables, table aliases are useful (and, in some cases, essential) to clarify statements. The query engine treats a table alias as a synonym of the table it represents. Table aliases are mandatory in queries that form multiple streams from within the same table.

TIP

An alias can be used wherever it is valid to use the table’s name as a qualifier and all table identifiers must be substituted. Mixing table identifiers with aliases in the same query will cause exceptions from Firebird 1.5 onward. The following example uses table identifiers:

This is the same example using aliases:

Legal Table Alias Names

Use any useful string composed of up to 31 of the characters that are valid for metadata qualifiers (i.e., alphanumeric characters with ASCII codes in the ranges 35–38, 48–57, 64–90, and 97–122). Spaces, diacritics, double-quoted alias names, and alias names that begin with a numeral are not legal.

The Internal Cursor

When reading through a stream, the database engine implements a pointer, whose address changes as the read advances from top to bottom. This pointer is known as a cursor—not to be confused with the cursor set that is implemented in SQL with DECLARE CURSOR. Internal cursors are not accessible to clients except through the medium of join and subquery syntax.

The current address of an internal cursor is an absolute offset from the address of the first stream in the set, which means it can only advance forward. Internally, the optimizer utilizes indexes and organizes input streams into a plan, to ensure that a query begins returning output in the shortest possible time. The optimization process and plans are discussed in detail in the “Optimization Topic” section at the end of this chapter.

In any multi-table operation, the Firebird engine maintains one internal cursor for each stream. In the preceding JOIN examples, both TableA and TableB have their own cursors. When a match occurs, the engine creates a merged image for the output stream by copying the streams from the current addresses of the two cursors, as shown in Figure.

Internal cursors for a two-table join

Internal cursors for a two-table join

Re-Entrant Joins

Design conditions sometimes require forming a joined set from two or more streams that come from the same table. Commonly, a table has been designed with a hierarchical or tree structure, where each row in the table is logically the “child” of a “parent” in the same table. The table’s primary key identifies the child-level node of the tree, while a foreign key column stores a “parent” key referring to the primary key value of a different row.

A query to “flatten” the parent-child relationship requires a join that draws one stream for “parents” and another for “children” from the same table. The popular term for this is self-referencing join. The term re-entrant join is morphologically more appropriate, since there are other types of re-entrant query that do not involve joins. The “Subqueries” section later in this chapter discusses these other types of re-entrant queries.

Cursors for Re-Entrant Joins

To perform the re-entrant join, the engine maintains one internal cursor for each stream, within the same table image. Conceptually, it treats the two cursor contexts as though they were separate tables. In this situation, the syntax of the table references uses mandatory aliasing to distinguish the cursors for the two streams.

In the following example, each department in an organization is stored with a parent identifier that points to the primary key of its supervising department. The query treats the departments and the parent departments as if they were two tables:

Figure illustrates how the child records are selected on the re-entrant visit to the DEPARTMENT table.

Internal cursors for a simple re-entrant join

Internal cursors for a simple re-entrant join

The simple, two-layered output looks like Figure.

The output from this query is very simple: a one-level denormalization. Tree output operations on tables with this kind of structure are often recursive, using stored procedures to implement and control the recursions.

Output from the simple re-entrant join

Output from the simple re-entrant join


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics