Moving Sum Using the MSUM Function Teradata

Compatibility: Teradata Extension

A moving sum incorporates the same addition as seen in the CSUM function. However, the aspect of moving involves establishing a window based on a number of rows to be used in the addition.

The Moving Sum (MSUM) function provides the moving sum or total on a column's value, based on a defined number of rows. The number of rows is referred to as the query width. Like the CSUM, the MSUM defaults to sorting the results in ascending order.

The syntax for MSUM is:

The MSUM uses the first parameter as the column containing a numeric value to sum.

The second parameter of the MSUM is the width number. It represents the number of rows included in the summation. Valid values are from 1 to 4096. If the number of rows is less than the width defined then the calculation will be based on the rows present and it will act like the CSUM. Although a value of 1 is allowed, it makes no sense to sum every row.

The third parameter is entered as one or more sort keys that sequence the spooled rows. The sort will determine the sequence of the detail row data for the MSUM operation. By default, the sort sequence is ascending (ASC). The DESC can be specified to request a descending (highest to lowest) sequence. Optionally, additional sort keys can be entered to request one or more minor sort sequences. Again, this sort is performed inside the MSUM and not by the ORDER BY statement at the end of the query.

The next SELECT shows a 3-row (day) moving sum for all products from the Sales table using MSUM.


9 Rows Returned

SELECT shows a 3-row (day) moving sum for all products from the Sales table using MSUM
SELECT shows a 3-row (day) moving sum for all products from the Sales table using MSUM

In the above report, notice how the MSUM on the amount of the daily sales continues to increase until September 29. At that time, it drops by 26792.64 (row 1=61301.77 minus row 4=34509.13). In other words, with a width of 3, as the next row for product 3000 comes into the MSUM, the first row drops out of the total. By the time the last row comes into the sum operation it adds all three rows for September 30 to arrive at a total of 129,718.96.

So, the first few lines of output, less than the value of <width>, are the sum of those lines. However, after the initial <width> has been reached, the value will always be for the number of lines specified by <width> in the MSUM until the last row of the output has been returned.

The following SELECT modifies the above query for a 5-day window, makes the Product ID the major sort within the MSUM sort list (appears first) and the sales date as the minor sort:


14 Rows Returned

msum
msum

Now, 5 rows go into the accumulation and the value fluctuates for product 1000 on October 3 as the sixth row come into the operation and the first row drops out. This continues for each five rows until arriving at the 216,722.93 for the last five rows for product 2000. This allows us to see trends during certain time frames of our choosing. When we see a big or small number jump out at us in the accumulation we can investigate why. Did we run an ad campaign or have a sale? Was it a holiday?

Moving Sum with Reset Capabilities

The moving sum operation can also be written to provide the equivalence of a subtotal. When the GROUP BY designation is added to the query it indicates which column or columns to monitor for a change to occur. It also indicates the order of the sort to perform. When the data value in the column changes, the GROUP BY causes the accumulation value to be reset back to zero. Hence a subtotal is provided within a moving window.

Using MSUM and GROUP BY

A moving sum may be reset to zero at a specified breakpoint. Like the CSUM, this is accomplished by merely adding a GROUP BY clause to the query using the MSUM. Although, it is important to remember that this is an additional reset because the reset also occurs when the width has been exceeded by new rows being read and used in the moving window.

The next SELECT produces a report like the above using the MSUM, however, it is sorting and breaking on the Product ID instead of the sale date and limits the scope of the products to1000 and 2000:


14 Rows Returned

Using MSUM and GROUP BY

Now the report demonstrates a 3-day moving average by product instead of for all products. This is a more meaningful report. The sales volume continues to move with the next three days (rows), but the reset occurs when the first occurrence of product 2000 is encountered. At that time, the counter returns to zero and increases for 3 rows as determined by <width> and then fluctuates based on the new value from the next row and the removal of the value from the row that was read <width> rows ago. In simple terms you can almost pick any row and examine the MSUM column. You will notice that the calculation was made using the current row plus the two previous rows only. This is the purpose of the width.

Now we can look for trends to see where the data seems to jump high or low and then we can utilize our SQL to probe deeper to find why. Once we see that in one 3-day period we sold double the amount we usually do we might find that we had ran an advertising campaign that day or provided coupons.

The next SELECT produces a report like the above using the MSUM, however, it is sorting and breaking on both the Product ID and on the sale date and limits the scope of the processing to products 1000 and 2000:

14 Rows Returned

limits the scope of the processing to products 1000 and 2000
limits the scope of the processing to products 1000 and 2000


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics