CUBE Extension to GROUP BY - Data Warehousing

CUBE takes a specified set of grouping columns and creates subtotals for all of their possible combinations. In terms of multidimensional analysis, CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions. If you have specified CUBE(time, region, department), the result set will include all the values that would be included in an equivalent ROLLUP statement plus additional combinations. For instance, in the figure, the departmental totals across regions (279,000 and 319,000) would not be calculated by a ROLLUP(time, region, department) clause, but they would be calculated by a CUBE(time, region, department) clause. If n columns are specified for a CUBE, there will be 2 to the n combinations of subtotals returned. The following example gives an example of a three-dimension cube. See Oracle Database SQL Reference for syntax and restrictions.

When to Use CUBE
Consider Using CUBE in any situation requiring cross-tabular reports. The data needed for cross-tabular reports can be generated with a single SELECT using CUBE. Like ROLLUP, CUBE can be helpful in generating summary tables. Note that population of summary tables is even faster if the CUBE query executes in parallel.

CUBE is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension. For instance, a commonly requested cross-tabulation might need subtotals for all the combinations of month, state, and product. These are three independent dimensions, and analysis of all possible subtotal combinations is commonplace. In contrast, a cross-tabulation showing all possible combinations of year, month, and day would have several values of limited interest, because there is a natural hierarchy in the time dimension. Subtotals such as profit by day of month summed across year would be unnecessary in most analyses. Relatively few users need to
ask What were the total sales for the 16th of each month across the year?

CUBE Syntax
CUBE appears in the GROUP BY clause in a SELECT statement. Its form is:

SELECT … GROUP BY CUBE (grouping_column_reference_list)

Example CUBE

SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND
sales.channel_id= channels.channel_id AND channels.channel_desc IN
('Direct Sales', 'Internet') AND times.calendar_month_desc IN
('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US')
GROUP BY CUBE(channel_desc, calendar_month_desc, countries.country_iso_code);

CUBE
This query illustrates CUBE aggregation across three dimensions.

Partial CUBE
Partial CUBE resembles partial ROLLUP in that you can limit it to certain dimensions and precede it with columns outside the CUBE operator. In this case, subtotals of all possible combinations are limited to the dimensions within the cube list (in parentheses), and they are combined with the preceding items in the GROUP BY list.

The syntax for partial CUBE is as follows:

GROUP BY expr1, CUBE(expr2, expr3)

This syntax example calculates 2*2, or 4, subtotals. That is:

  • (expr1, expr2, expr3)
  • (expr1, expr2)
  • (expr1, expr3)
  • (expr1)

Example Partial CUBE

Using the sales database, you can issue the following statement:

SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id = times.time_id AND sales.cust_id = customers.cust_id AND
sales.channel_id = channels.channel_id AND channels.channel_desc IN
('Direct Sales', 'Internet') AND times.calendar_month_desc IN
('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US')
GROUP BY channel_desc, CUBE(calendar_month_desc, countries.country_iso_code);

Partial CUBE

Partial CUBE
Calculating Subtotals Without CUBE
Just as for ROLLUP, multiple SELECT statements combined with UNION ALL statements could provide the same information gathered through CUBE. However, this might require many SELECT statements. For an n-dimensional cube, 2 to the n SELECT statements are needed. In the three-dimension example, this would mean issuing SELECT statements linked with UNION ALL. So many SELECT statements
yield inefficient processing and very lengthy SQL.

Consider the impact of adding just one more dimension when calculating all possible combinations: the number of SELECT statements would double to 16. The more columns used in a CUBE clause, the greater the savings compared to the UNION ALL approach.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Data Warehousing Topics