Group Functions and the GROUP BY Clause - Oracle DBA

This section explains how you can group rows together and perform some kind of aggregate operation on them. For example, you may want to count the rows for a given condition, calculate the averages of numeric columns, or find the highest or lowest value for a given column in a query result.

Aggregate A type of function in Oracle SQL that performs a calculation or transformation across multiple rows in a table, rather than just on a single row.

The GROUP BY clause fits into the SELECT statement as follows:

All group functions ignore NULLs by default. If you wanted to calculate the average commission across employees, you would most likely not want to consider employees who are not in the sales area (and therefore have a NULL commission value). On the other hand, you might want to treat NULL values numerically in other situations. You will see later in this chapter how you can convert NULL values with the NVL function.

Group Functions

Table lists some of the most commonly used group functions in SQL statements. The COUNT function is the only aggregate function that will count rows with NULL values in any column when * is used as an argument.

Common group function

Common group function

All of the functions listed in Table have a calling sequence as follows:

As mentioned earlier, the COUNT function allows for * as its only argument, to specify that rows are to be counted, whether or not they have NULL values. The COUNT, MIN, and MAX functions apply to date and string expressions in addition to numeric expressions; the rest must have numeric arguments only.
The DISTINCT keyword indicates that duplicates are to be removed before the aggregate calculation is done. For example, calculating AVG(SALARY) versus AVG(DISTINCT SALARY) would be quite different if most of the employees are at one end of the pay scale. ALL is the default.

The boss, King, wants to get more information on salary distribution by department, so he asks Janice to give him the count of employees and the average salary and commission for his department, which has a department ID of 90. Janice runs the following query:

Common group function

1 row selected.

Notice that the average commission in this case is not zero but NULL; there were no employees in department 90 with a commission. The result would have been non-NULL, if there were at least one employee who worked on a commission for part of their salary.

The next morning, the boss asks the same question for department 80, which has the bulk of the commissioned employees. Janice gets the answer with this query:

Group-Functions

1 row selected.

Janice hears rumors that King is going to ask for a breakdown of the number of employees, how many are on commission, and how many distinct commission percentages there are. She comes up with this query:

Group-Functions

1 row selected.

What does this tell King? The total number of employees is 107, regardless of whether there are any NULL values in any of the columns. Of those employees, 35 are on commission (have a non-NULL value for COMMISSION_PCT), and out of those 35, there are seven different commission levels in force at the company.

Janice also suspects that King will be asking for some statistics for other departments. Rather than run the same query for different department numbers, she decides that it might be worthwhile to use the GROUP BY function to give King all the information he needs in a single query.

The GROUP BY Clause

The GROUP BY clause is used to break down the results of a query based on a column or columns. Once the rows are subdivided into groups, the aggregate functions described earlier in chapter can be applied to these groups. Note the following rules about using the GROUP BY clause:

  • All columns in a SELECT statement that are not in the GROUP BY clause must be part of an aggregate function.
  • The WHERE clause can be used to filter rows from the result before the grouping functions are applied.
  • The GROUP BY clause also specifies the sort order; this can be overridden with an ORDER BY clause.
  • Column aliases cannot be used in the GROUP BY clause.

Janice has been busy preparing a report for King that will break down the salary and commission information by department. Her first query looks like this:

GROUP BY Clause

12 rows selected.

This gives King a breakdown, by department, of the employee count, the average salary, and the average commission. NULLs are not included in the calculation for commission or salary. King likes this report, but Janice suspects that he will be asking for something different tomorrow.

One of the departments has a NULL value. There is one employee who has not yet been assigned to a department, but this employee does have a salary and a commission.

As expected, King calls the next day with another request. He wants to see how the salaries and commissions break out within department by job function. Janice realizes that all she needs to do is to add the job ID to the query in both the SELECT clause and the GROUP BY clause:

GROUP-BY-Clause

20 rows selected.

As a side benefit, this also gives King the breakdown of jobs within each department.

Using NVL with Group Functions

As mentioned earlier in this chapter, group functions will ignore NULL values in their calculations. In most cases, this makes a lot of sense. For example, if only a small handful of employees worked on commission, and you calculated the average commission with the assumption that a NULL commission was essentially a zero commission, then the average commission would be quite low!

How you should interpret NULL values in a column depends on the business rules of the company and what NULL values represent. An average commission is usually based on only those employees who work on commission, and, in this case, the default behavior of Oracle's grouping functions makes sense.

However, there may be times when it makes sense to convert NULL values to something that can be aggregated. Let's assume for the moment that there is a column called COMMISSION_AMT in the EMPLOYEES table that records the latest monthly commission received by that employee. Just as with the COMMISSION_PCT column, the COMMISSION_AMT field is NULL for all employees except those in the Sales department. If King wanted a report of the average salary and commission (if any) by department, the expression

in the SELECT clause would give results for only those rows with non-NULL commissions. That would not be what King was looking for. Janice would need to essentially convert any NULL values to zero. This is what NVL will do, and the expression above can be rewritten as

For each row, if the COMMISSION_AMT is NULL, it is converted to zero (or any other amount you want) and added to SALARY, and the average is returned after all rows have been read.

The HAVING Clause

The HAVING clause is analogous to the WHERE clause, except that the HAVING clause applies to aggregate functions instead of individual columns or single-row function results. A query with a HAVING clause still returns aggregate values, but those aggregated summary rows are filtered from the query output based on the conditions in the HAVING clause.

The HAVING clause fits into the SELECT syntax as follows:

The queries that Janice wrote for King have the information he needs, but his time is limited and he wants to see only the breakdowns for the department and job combinations that have average salaries over $10,000. Janice takes the original query

and adds a HAVING clause that removes the lower average salaries:

The HAVING Clause

7 rows selected.

Janice becomes proactive again, and she anticipates that King will want to see only certain jobs in the report. She can easily add a WHERE clause to select only administrative and sales positions. She uses the LIKE clause to select these job functions:

The HAVING Clause

3 rows selected.

The order of the WHERE, GROUP, and HAVING clauses does not change how the query is run or the results; however, the ordering shown here is indicative of how the SQL engine processes the command. If an ORDER BY clause was needed in the above query, it could be placed anywhere after the SELECT clause but would most logically belong at the end of the query.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Oracle DBA Topics