The SELECT Statement - Firebird

The SELECT statement is the fundamental method for clients to retrieve sets of data from the database. It has the following general form:

Clauses in a SELECT Statement

In the following topics, we take an introductory look at each allowable clause of the SELECT statement. Most clauses are optional, but it is important to present those you do use in the correct order.

Optional Set Quantifiers

Following the SELECT keyword, a set quantifier can be included to govern the inclusion or suppression of rows in the output set once they have met all other conditions.

ALL

This is the default quantifier for output lists and is usually omitted. It returns all rows that meet the conditions of the specification.

DISTINCT

This row quantifier suppresses all duplicate rows in the sets that are output. For example, the table EMPLOYEE _PROJECT stores an intersection record for each employee (EMP_NO) and project (PROJ_ID) combination, resolving a many -to-many relationship. EMP_NO + PROJ_ID forms the primary key.

SELECT DISTINCT EMP_NO, PROJ_ID FROM EMPLOYEE_PROJECT;

returns 28 rows—that is, all of them, which is the same as SELECT [ALL], because every
occurrence of (EMP_NO + PROJ_ID) is, by nature, distinct.

SELECT DISTINCT EMP_NO FROM EMPLOYEE_PROJECT;

and

SELECT DISTINCT PROJ_ID FROM EMPLOYEE_PROJECT;

return 22 and 5 rows, respectively.

Evaluation of distinctness is applied to all of the output columns, making it useful in some queries that use joins to produce a denormalized set. Test it well to ensure that it produces the result you expect.

FIRST (m) SKIP (n)

The optional keywords FIRST (m) and/or SKIP (n), if used, precede all other specifications. They provide the option to select the first m rows from the output of an ordered set and to ignore the first n rows of an ordered set, respectively. It does not make sense to use this construct with an unordered set. The ORDER BY clause obviously needs to use an ordering condition that actually makes sense with regard to the selection of the candidate rows.

The two keywords can be used together or individually. The arguments m and n are integers, or expressions that resolve to integers. The brackets around the m and n values are required for expression arguments and optional for simple integer arguments.

Since FIRST and SKIP operate on the set that is output by the rest of the specification, they cannot be expected to make the query execute faster than the full specification would otherwise do. Any performance benefit comes from the reduction of traffic across the wire.

The following example will return five rows, starting at row 101 in the ordered set:

SELECT <List of Columns>

The SELECT clause defines the list of columns that are to be returned in the output set. It must contain at least one column, which does not have to be a column that exists in a table. That statement is not as strange as it sounds. The column list is really an output specification and this is data manipulation language (DML). The output specifications can include any of the following:

  • The identifier of a column that is stored in a table, specified in a view, or declared as an output argument to a stored procedure. Under some conditions, the column identifier must be qualified with the name or alias of the table it belongs to.
  • A simple or complex expression, accompanied by a runtime identifier.
  • A constant value, accompanied by a runtime identifier.
  • A server context variable, accompanied by a runtime identifier.
  • The * symbol, popularlknown as “select star,” which specifies every column. Although SELECT * does not preclude selecting one or more columns from the same table individually, it does not make sense to do so in general. To include a duplicated column for a special purpose, apply the AS keyword and an alias to it and return it as a computed (read-only) field.

The following SELECT specifications are all valid. Simple list of columns:

SELECT COLUMN1, COLUMN2 ...

Qualified column names, required in multi-table specifications:

Expression (aggregating):

SELECT MAX(COST * QUANTITY) AS BEST_SALE ...

Expression (transforming):

Variables and constants:

All columns in a table:

SELECT * ...

Quantifiers:

SELECT FIRST 5 SKIP 100 ACOLUMN, BCOLUMN ... /* this has no sense in absence of a later ORDER BY */

Expressions and Constants As Output Fields

A constant or an expression —which may or may not include a column name—can be returned as a read-only, runtime output field. The field should be given a column name of its own, unique in all sets involved. Names for runtime columns are known as columnaliases. For clarity, the column alias can be marked optionally with the keyword AS.

Taking the previous example

Constants, many different kinds of expressions involving functions and calculations, and scalar subqueries (including correlated subqueries) can be massaged into read-only output fields.

For details about expressions and functions, refer to the next chapter.

FROM <Table-or-Procedure-or-View>

The FROM clause specifies the source of data, which may be a table, a view, or a stored procedure that has output arguments. If the statement involves joining two or more structures, the FROM clause specifies the leftmost structure. Other tables are added to the specification by way of succeeding ON clauses (see the upcoming section “JOIN <Specification>”).

In the following examples, some FROM clauses are added to the SELECT specifications in the previous set of examples:

SQL-89 Inner Join Syntax

Firebird provides backward support for the deprecated SQL-89 implicit inner join syntax, for example:

JOIN <Specification>

Use this clause to add the names and joining conditions of the second and each subsequent data stream (table, view, or selectable stored procedure) that contributes to a multi-table SELECT statement—one JOIN .. . ON clause for each source set. The following statement illustrates a simple inner join between the two tables from the previous example:

Table Aliases

In the same statement fragment, table identifiers can be optionally substituted with table aliases, for example:

WHERE <Search-Conditions>

Search conditions limiting the rows for output are located in the WHERE clause. Search conditions can vary from a simple match condition for a single column to complex conditions involving expressions; AND, OR, and NOT predicates; type casting; character set and collation conditions; and more.

The WHERE clause is the filtering clause that determines which rows are candidates for inclusion in the output set. Those rows that are not eliminated by the search conditions of the WHERE clause may be ready for output to the requestor or they may “go forward” for further processing, ordering by an ORDER BY clause, with or without consolidation by a GROUP BY clause.

The following simple examples illustrate some WHERE clauses using a sampling of conditions to limit the rows selected:

The next chapter is devoted to the topic of expressions and predicates for defining search conditions.

Parameters in WHERE Clauses

Data access interfaces that implement the Firebird API have the capability to process the constants in search conditions as replaceable parameters. Thus, an application can set up a statement with dynamic search conditions in the WHERE clause, for which values do not need to be assigned until just before execution. This capability is sometimes referred to as late binding.

For more details, see the upcoming section “Using Parameters.”

GROUP BY <Grouped-Column-List>

The output from the SELECT statement can optionally be partitioned into one or more nested groups that aggregate (summarize) the sets of data returned at each nesting level. These groupings often include aggregating expressions, expressions containing functions that work on multiple values, such as totals, averages, row counts, and minimum/ maximum values.

The following simple example illustrates a grouping query. The SQL aggregating function SUM() is used to calculate the total items sold for each product type:

The output might be similar to the following:

Firebird provides an extensive range of grouping capability, with very strict rules governing the logic.

Aggregating expressions are discussed in the next chapter.

HAVING <Grouping-Column Predicate>

The optional HAVING clause may be used in conjunction with a grouping specification to include or exclude rows or groups, similar to the way the WHERE clause limits row sets. Often, in a grouping query, a HAVING clause can replace the WHERE clause. However, because HAVING operates on the intermediate set created as input to the GROUP BY specification, it may be more economical to prefer a WHERE condition to limit rows and a HAVING condition to limit groups.

The previous example is modified by a HAVING clause, to return just the PRODUCT_TYPE that has been sold in quantities greater than 100:

This is the output:

UNION <Select-Specification>

UNION sets are formed by combining two or more separate query specifications, which may involve different tables, into one output set. The only restriction is that the output columns in each separate output specification must match by degree, type, and size. That means they must each output the same number of columns in the same left -to-right order and that each column must be consistent throughout in data type and size.

By default, a UNION set suppresses duplicates in the final output set. To retain all duplicates, include the keyword ALL.

PLAN <Plan-Expression>

The PLAN clause allows a query plan to be optionally included in the query specification. It is an instruction to the optimizer to use particular indexes, join order, and access methods for the query. The optimizer creates its own plan when it prepares a query statement. You can view the plan in isql and many of the available GUI utilities. Usually, “the optimizer knows best,” but sometimes it can be worthwhile to experiment with variations to the optimizer’s plan when a query is slow.

ORDER BY <Column-List>

Use this clause when your output set needs to be sorted. For example, the following gets a list of names in alphabetical order by last name and first name:

Unlike GROUP BY columns, ORDER BY columns do not have to be present in the output specification (SELECT clause). The identifier of any ordering column that also appears in the output specification can be replaced by its position number in the output spec, counting from left to right:

Pay close attention to the indexes on columns that are going to be used for sorting.

The FOR UPDATE Clause

This is the syntax:

[FOR UPDATE [OF col1 [,col2..]] [WITH LOCK]]

Broadly, the FOR UPDATE clause is meaningless except in the context of a SELECT statement that is used to specify a named cursor. It instructs the engine to wait for a FETCH call, fetch one row into the cursor for a “current row” operation, and then wait for the next FETCH call. As each row is fetched, it becomes earmarked for an update operation.

The optional sub-clause OF <column-list> can be used to provide a list of fields present in the cursor that are allowed to be updated.

  • In ESQL applications, a DECLARE CURSOR statement is used to set up the named cursor. For more information, refer to the InterBase 6.0 Embedded SQL documentation.
  • Applications providing DSQL interfaces must use the isc_dsql_set_cursor_name function to obtain a named cursor and use FOR UPDATE meaningfully.

Dynamic Applications

Because DSQL does not surface FETCH as a language element, application interfaces implement it using an API function call named isc_dsql_fetch.

The API “knows” the order and format of the output fields because of the descriptive structures—named Extended SQL Descriptor Areas, or XSQLDAs—that dynamic SQL applications are required to pass to it. One XSQLDA contains an array of complex variable descriptors named SQLVARs, one for each output field.

Client application interfaces use isc_dsql_fetch to request a row of output, which is a freshly populated XSQLDA. The typical behavior for many modern client application interfaces is to provide looping calls to isc_dsql_fetch, in order to receive output rows in batches and buffer them into client-side structures that are variously known as recordsets, datasets, or result sets.

Some API application interfaces implement a named cursor and surface FOR UPDATE behavior, but most do not.

The WITH LOCK Sub-clause

Firebird 1.5 introduced an optional WITH LOCK extension, for use with or without the FOR UPDATE clause syntax, to support a restricted level of explicit, row-level pessimistic locking. Pessimistic locking is antithetical to the transaction architecture of Firebird and introduces complexity. Its use is recommended only by developers with an advanced understanding of how multi-user concurrency is implemented in Firebird.

Queries that Count Rows

An entrenched practice exists among some programmers of designing applications that need to perform a row count on output sets. Firebird does not have a quick or reliable way to return the number of rows that will be returned in an output set. Because of its multi-generational architecture, Firebird has no mechanism to “know” the cardinality of rows in persistent tables. If an application must have a row count, it can get an approximation using a SELECT COUNT(*) query.

SELECT COUNT (*) Queries

A SELECT statement with the COUNT() function call replacing a column identifier will return the approximate cardinality of a set defined by a WHERE clause. COUNT() takes practically anything as an input argument: a column identifier, a column list, the keyword symbol * representing “all columns,” or even a constant.

For example, all of the following statements are equivalent, or nearly so. However, SELECT COUNT (<some-column-name >) does not include in the count any rows where <some-column-name> is NULL:

COUNT(*) is a very costly operation, since it can work only by walking the entire dataset and literally counting each row that is visible to the current transaction as committed. It should be treated as a “rough count,” since it will become out of date as soon as any other transaction commits work.

Although it is possible to include COUNT(*) as a member of an output set that includes other columns, it is neither sensible nor advisable. It will cause the entire dataset to be walked each time a row is selected for output.

The exception is when COUNT(*) is included in an output set being aggregated by a GROUP BY clause. Under those conditions, the count is cheap—it will be performed on the aggregated group in the course of the aggregation process, for example:

Existence Checking

Do not use SELECT COUNT(*) as a way to check for the existence of rows meeting some criteria. This technique frequently shows up in applications that have had their “back-end” upgraded to Firebird from file-based, table-locking databases such as Paradox or MySQL, and it needs to be abandoned. Instead, use the EXISTS() function predicate, which is designed for the purpose and is very fast. Refer to the next chapter for details of EXISTS() and other function predicates.

“Next Value” Calculations

Another technique that must be abandoned in Firebird is using COUNT(*) and adding 1 to “generate” the next value for a primary key. It is unsafe in any multi-user DBMS that isolates concurrent tasks. In Firebird, it is also extremely slow, because the table management system has no “file of records” that can be counted by the computer’s file management methods.

Use generators for anything that needs a unique numbering sequence.

Variations with COUNT()

The result of COUNT() will never be NULL, because it counts rows. If count has been predicated over an empty set, it returns zero. It can never be negative.

COUNT(*) in a table counts all rows with no conditions on the existence of data in columns. It can use an index if the query contains a WHERE condition that the optimizer can match to it. For example, the statement

may be a little less costly if there is an index on LAST_NAME.

COUNT(column_name) counts only rows where column_name is not NULL.

COUNT(DISTINCT column_name) counts only the distribution of distinctly different values in that column. That is, all repetitions of the same value are accounted for in one counted item.

With COUNT(DISTINCT..), if the column allows NULL, then all rows holding NULL in the column are excluded from the count. If you must count them, it can be done with a “hack”:


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics