Cumulative Sum Using the CSUM Function Teradata

The process of creating a cumulative sum means that data values in sequential rows are added together. The same addition was seen earlier in this when using the SUM aggregate, with some major differences that are contrasted here.

The Cumulative Sum (CSUM) function provides a running or cumulative total for a column's numeric value. This allows users to see what is happening with column totals over an ongoing progression. The results will be sorted in ascending or descending order and the sort list can consist of a single or multiple columns, listed as sort keys.

The syntax for CSUM is:

The CSUM command uses the first parameter as the column containing a numeric value to sum. This value will be added to the previous data values and provide a running or cumulative answer.

The second parameter is entered as a sort key that sequences the output rows. This column determines the major sort sequence of the detail data from the row along with the CSUM. By default, the sort sequence is ascending (ASC). The DESC can be specified to request a descending (highest to lowest) sequence. Optionally, additional sort keys can be entered to specify one or more minor sort sequences.

To understand the use of CSUM, we will use a table that stores sales data for all products for each store on a daily basis. The CSUM function can be used to show the daily sales data for any or all of the products and accumulate the sales data for both running and final totals.

The column specified in this CSUM should contain the sales dollar figure for each day and the sort key as the date. Then, the WHERE clause provides a beginning and ending date for the query.

The next SELECT accumulates the daily sales for products 1000 and 2000, and sorts on the date of the sale:

14 Rows Returned

From the above output, it is easy to see that the report is in sequence by the Sale date column. Since OLAP functions are capable of sorting on more than a single column, it might be advisable to sequence by the product ID too. Adding an ORDER BY is most everyone's first thought to provide a sort for the product ID. Although it is syntactically correct, it is not the correct approach.

You can also see the true purpose of the CSUM command. The first data row contains 41,888.88 and is added to 0 by CSUM to obtain 41,888.88. The second row returned a daily sale of 48,850.40 that is added to 41,888.88 for a CSUM of 90,739.28. This continues adding each consecutive value until the last row is processed with a final total of 637,816.53. The following demonstrates the INCORRECT way to add the product ID by using an ORDER BY:

14 Rows Returned

Notice that the final answer is the same. However, this incorrect technique produces CSUM values that are not continually increasing in value, as they should. The first value is 48,850.40 for a total of 90,739.28; this is not correct. In reality, the CSUM worked fine, but the answer output is re-sorted after the addition completed.

The next SELECT modifies the above query to eliminate the ORDER BY and make the Product ID the major sort within the CSUM sort list (appears first) and the sales date as the minor sort:

14 Rows Returned

Although the CSUM column contains the same final total result as the previous report, the intermediate results are different due to the new sequencing on the product ID. This sorting should always be requested within the CSUM function and never at the end of the query where we traditionally see most sort intentions. Otherwise, the total may be correct, but the accumulation will look incorrect. Remember that the ORDER BY sorts as the last operation before returning the rows. The addition is performed as part of the CSUM, after its sort and before the ORDER BY sort.

Using the above SQL, the only way to see totals for each product is to formulate a WHERE clause that only allows the rows for an individual product to be used. Although this works, it is not as convenient as having all products in a single output display with subtotals.

Cumulative Sum with Reset Capabilities

The CSUM has the ability to reset values throughout the accumulation process to obtain a subtotal. It uses the GROUP BY designation to specify a data column that, when the value changes, causes the accumulation value to be reset back to zero.

Using CSUM and GROUP BY

A cumulative sum may be reset to zero at a specified breakpoint by merely adding a GROUP BY clause to the query when using the CSUM function. The GROUP BY provides for subtotals within the output.

Below, another previously used SELECT is modified to show the cumulative sales with a reset (subtotal) when the date changes:

9 Rows Returned

Now the CSUM value increases for all sales on October 1, but returns to the amount of the first sale amount on October 2 and essentially starts the CSUM over at zero. Every time the value in the Sale date column changes, the CSUM value is reset to zero and the first sale on that date is added to it. This operation is a result of using the GROUP BY on the sale date. If the reset is to occur on the Product ID for total sales of each product, it should be in the sort list for the CSUM and also in the GROUP BY. This provides the equivalent of a subtotal for each of the sales on the same day.

What makes this example interesting is that traditionally, anytime a query uses aggregates with non-aggregates a GROUP BY statement must by used to add more detail data to the output. This is not the case with the OLAP functionality. Here the GROUP BY list simply provides a column to sort and break or reset when the value changes.

The next SELECT demonstrates the way to reset on Product ID and sequence the output using the sales date also:

9 Rows Returned

As seen earlier in the chapter, adding an ORDER BY is not the correct technique. Use the CSUM for all sorting whether requesting one column or several columns; do not use the ORDER BY.

Generating Sequential Numbers with CSUM

Another more obscure use of CSUM is to generate sequential numbers. Sometimes, it is helpful to have a number like this to identify individual rows from a table. For instance, at times, it is advantageous to store data with a unique identifier as a UPI instead of using actual user data. Until the CSUM became available, generating a sequential number for this use, this was a more difficult task.

The following SELECT demonstrates the ability to generate a sequential number:

9 Rows Returned

Notice that the first argument passed to the CSUM is no longer a column name. Instead, it is the literal value of 1. Since the value of the first argument is added to all the previous values, every time a row is read, this CSUM adds 1 to the previous value.

Hence, adding 1 obtains a sequential incremented value every time a row is read. Furthermore, the GROUP BY may still be used to restart the CSUM value if it is used as a part of a composite primary index.

In a normal SELECT, this is probably of little use. However, when storing data rows, an INSERT/SELECT can be built using this technique to store the sequential number as part of a new row in another table.