The GROUP BY Clause - Firebird

When a query includes a GROUP BY clause, the output of the column and table specification, namely

SELECT {<column-list} FROM <table-specification> [WHERE <search-condition>]

gets passed to a further stage of processing, where the rows are partitioned into one or more nested groups.

Each partition typically extracts summaries by way of expressions that perform some operation on groups of numeric values. This type of query is known as a grouped query and its output is often referred to as a grouped set.

This is the syntax:

SELECT <groupable-field-list> FROM <table-specification> [WHERE...] GROUP BY <grouping-item> [COLLATE collation-sequence] [, <grouping-item [..]] HAVING <grouping-column predicate> [ORDER BY ...];

The Groupable Field List

A group is formed by collecting together (aggregating) all of the rows where a column named in both the column list and the GROUP BY clause share a common value. The logic of aggregation means that the field list specified by SELECT for a grouped query is tightly restricted by the fields named as arguments in the GROUP BY clause. Fields having specifications that meet these requirements are often referred to as groupable. If you specify a column or field expression that is not groupable, the query will be rejected.

  • A database column or non-aggregating expression cannot be specified in the column list if it is not also specified in the GROUP BY clause.
  • An aggregating expression operating on a database column that is not in the GROUP BY clause can be included in the column list. The use of an alias for the result of the expression is strongly recommended.

Aggregating Expressions

Firebird has a group of aggregating functions that are typically used in combination with grouping conditions to calculate group -level totals and statistics.

The aggregating functions are SUM( ), which calculates totals; MAX( ) and MIN( ), which return the highest and lowest values, respectively; and AVG( ), which calculates averages. The COUNT( ) function also behaves as an aggregating function in grouped queries, providing a row count for all row members of the lowest level group.

Unlike other groupable items, an aggregating expression from the SELECT list cannot be used as a grouping item (see “The Grouping Item” section), since it outputs a value that is calculated from values beneath the context of the group.

The table PROJ_DEPT_BUDGET contains rows intersecting projects and departments. We are interested in retrieving a summary of the budgets allocated to each project, regardless of department. The following item list, which will be explored later in this section, specifies a field list of the two items we want:

SELECT PROJ_ID, SUM(PROJECTED_BUDGET) AS TOTAL_BUDGET FROM PROJ_DEPT_BUDGET WHERE FISCAL_YEAR = 1994 GROUP BY PROJ_ID;

The two field specifications are going to be fine as groupable items. The department ID (DEPT_NO) is not in the list, because it is the project totals that we want. To get them, we make PROJ_ID the argument of the GROUP BY clause.

On the other hand, if we wanted to list the department budgets, regardless of project, the field list would be set up for DEPT_NO to be the argument of the GROUP BY clause:

SELECT DEPT_NO, SUM(PROJECTED_BUDGET) AS TOTAL_BUDGET FROM PROJ_DEPT_BUDGET WHERE FISCAL_YEAR = 1994 GROUP BY DEPT_NO;

Effects of NULL in Aggregations

In aggregate expressions using operators like SUM( ) and AVG( ) and COUNT (<specific_column_name>), rows containing NULL in the targeted column are ignored for the aggregation. AVG( ) forms the numerator by aggregating the non-null values and the denominator by counting the rows containing non-null values.

The Grouping Item

The GROUP BY clause takes one or a list of grouping items:

  • In Firebird 1.0, a grouping item can be only a column name or an appropriate external function (UDF) expression.
  • Firebird 1.5 extended the range of options for grouping items to allow grouping also by degree, an ordinal number representing the left-to-right position of the corresponding item in the SELECT field list, parallel to the existing option for ORDER BY arguments.
  • Firebird 1.5 and later versions also have the capability to group by most function expression, such as CAST( ), EXTRACT( ), SUBSTRING( ), UPPER( ), CASE( ), and COALESCE( ).

The following statement completes the query begun by the previous example:

Restriction

A grouping item cannot be any expression involving an aggregating function, such as AVG( ), SUM( ), MAX( ), MIN( ), or COUNT( )—that would aggregate in the same grouping context (level) as any grouping item. This restriction includes any aggregate expressions that are embedded inside another expression. For example, the DSQL parser will complain if you attempt this:

SELECT PROJ_ID, SUM(PROJECTED_BUDGET) AS TOTAL_BUDGET FROM PROJ_DEPT_BUDGET WHERE FISCAL_YEAR = 1994 GROUP BY 2; ISc ERROR CODE:335544569 Cannot use an aggregate function in a GROUP BY clause

Using COUNT( ) As an Aggregating Function

The much-maligned COUNT( ) function has its legitimate uses when employed in a grouped query to output summary counts for groups. Consider the following modification to our example. The column DEPT_NO is not an eligible contender as either a groupable item or a grouping item for our requirements, but we can extract information about it in the grouping context of PROJ_ID:

Non-Aggregating Expressions

The ability to use internal and external function expressions for grouping opens up a wide range of possibilities for “massaging” stored attributes to generate output sets that would not be possible otherwise.

For example, the internal function EXTRACT( ) operates on date and time types, to return date-parts —numbers that isolate the year, month, day, hour, etc. parts of temporal types. The following v.1.5 example queries a Membership table and outputs a statistic showing how many members joined in each month, by month, regardless of the year or day of their join dates:

SELECT MEMBER_TYPE, EXTRACT(MONTH FROM JOIN_DATE) AS MONTH_NUMBER, /* 1, 2, etc. */ COUNT (JOIN_DATE) AS MEMBERS_JOINED FROM MEMBERSHIP GROUP BY MEMBER_TYPE, EXTRACT(MONTH FROM JOIN_DATE);

A plethora of useful functions is available in the external function libraries for transforming dates, strings, and numbers into items for grouping. The following example illustrates grouping by some external functions found in the distributed ib_udf library:

SELECT STRLEN(RTRIM(RDB$RELATION_NAME)), COUNT(*) FROM RDB$RELATIONS GROUP BY STRLEN(RTRIM(RDB$RELATION_NAME)) ORDER BY 2;

It will work in any version of Firebird.

Some expressions are currently disallowed inside the GROUP BY list. For example, the parser rejects a grouping item that contains the concatenation symbol ||. So, for example, the query

SELECT PROJ_ID || '-1994' AS PROJECT, SUM(PROJECTED_BUDGET) AS TOTAL_BUDGET FROM PROJ_DEPT_BUDGET WHERE FISCAL_YEAR = 1994 GROUP BY PROJ_ID || '-1994';

returns this exception in Firebird 1.5:

ISC ERROR CODE:335544569 Token unknown - line 6, char 21 ||

Using the degree number of the expression field will work around the problem:

SELECT PROJ_ID || '-1994' AS PROJECT, SUM(PROJECTED_BUDGET) AS TOTAL_BUDGET FROM PROJ_DEPT_BUDGET WHERE FISCAL_YEAR = 1994 GROUP BY 1;

Grouping by Degree (Ordinal Number)

Using the degree number of the output column in the GROUP BY clause “copies” the expression from the select list (as does the ORDER BY clause). This means that, when a degree number refers to a subquery, the subquery is executed at least twice.

The HAVING Sub-Clause

The HAVING clause is a filter for the grouped output, corresponding to the way a WHERE clause filters the incoming, ungrouped set.

A HAVING condition uses exactly the same predicating syntax as a WHERE condition, but it should not be confused with the WHERE clause. The HAVING filter is applied to the groups after the set has been partitioned. You may still need a WHERE clause to filterthe incoming set.

In Firebird 1.0.x, you can specify a HAVING condition using columns that are not included in the groupable items—“a lazy WHERE clause” and a “bug” in terms of standards conformance. From v.1.5, only groupable items can be used with HAVING. It is important to recognize the impact of the HAVING condition on performance. It is processed after grouping is done. If it is used instead of WHERE conditions, to filter out unwanted members returned in groups named in the GROUP BY list, rows are wastefully double-processed, only to be eliminated when it is nearly all done.

For best performance, use WHERE conditions to pre-filter named groups, and use HAVING for filtering on the basis of results returned by aggregating expressions. For example, a group total calculated using a SUM(x) expression would be filtered by HAVING SUM(x) > < minimum-value>. A HAVING clause thus typically takes an aggregate expression as its argument.

Taking the previous query, we can use a WHERE clause to filter the project groups that are to appear in the output and a HAVING clause to set the starting range of the totals that we want to view:

The HAVING clause can take complex ANDed and ORed arguments, using the same precedence logic as the WHERE clause.

The COLLATE Sub-Clause

If you want a text grouping column to use a different collation sequence from the one defined as default for it, you can include a COLLATE clause.

Using ORDER BY in a Grouped Query

From v.1.5, items listed in the ORDER BY clause of a grouped query must be either aggregate functions that are valid for the grouping context or items that are present in the GROUP BY list.

Firebird 1.0.x is less restrictive—it will permit ordering on items or expressions that are out of the grouping context.

Advanced Grouping Conditions

Firebird 1.5 and later versions support some advanced grouping conditions that are not available in v.1.0.x.

Subqueries with Embedded Aggregations

A groupable field that is a correlated subquery expression can contain an aggregate expression that refers to an aggregate expression item in the GROUP BY list.

In the following example, a re-entrant subquery on the system table RDB$RELATION_FIELDS contains an aggregate expression (MAX(r.RDB$FIELD_POSITION) whose result is used to locate the name (RDB$FIELD_NAME) of the column having the highest field position number for each table (RDB$RELATION_NAME) in the database:

Along the same lines, this time we use COUNT( ) to aggregate the number of columns stored in each table:

Aggregations with Embedded Subqueries

An aggregating function expression—COUNT, AVG, etc.—can take an argument that is a subquery expression returning a scalar result. For example, the following query passes the result of a SELECT COUNT(*) query to a higher level SUM( ) expression that outputs for each table (RDB$RELATION_NAME) the product of the field count and the number of indexes on that table:

Aggregations at Mixed Grouping Levels

Aggregate functions from different grouping levels can be mixed in the same grouped query.

In the following example, an expression result based on a subquery that does a COUNT( ) on a column in a lower level group (RDB$INDICES) is passed as output to the grouping level. The HAVING clause performs filtering predicated on two further aggregations on the lower level group.

You can get results from running this query in Firebird 1.0.x, but they will be incorrect.

Nesting Aggregate Functions

An aggregating expression can be nested inside another aggregate expression if the inner aggregate function is from a lower context, as illustrated in the previous query.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Firebird Topics