Partitioning Data Using the QUANTILE Function Teradata

Compatibility: Teradata Extension

A Quantile is used to divide rows into a number of partitions of roughly the same number of rows in each partition. The percentile is the QUANTILE most commonly used in business. This means that the request is based on a value of 100 for the number of partitions. It is also possible to have quartiles (based on 4), tertiles (based on 3) and deciles (based on 10).

By default, both the QUANTILE column and the QUANTILE value itself will be output in ascending sequence. As in some cases, the ORDER BY clause may be used to reorder the output for display. Here the order of the output does not change the meaning of the output, unlike a summation where the values are being added together and all need to appear in the proper sequence.

The syntax of the QUANTILE function is:

The next SELECT determines the percentile for every row in the Sales table based on the daily sales amount and sorts it into sequence by the value being partitioned, in this case the daily sales amount:


8 Rows Returned

percentile for every row in the Sales table based on the daily sales amount
percentile for every row in the Sales table based on the daily sales amount

Notice that the amount of 32800.50 in the first two rows has the same percentile value. They are the same value and will therefore be put into the same partition.

The next SELECT uses a DESC in the sort list of the QUANTILE function:


9 Rows Returned

DESC in the sort list of the QUANTILE function

Notice that the only difference, between these two example outputs is in the first two rows of the second example. This is because the Sale date DESC, impacts the first two rows. Why?

Since these rows have the same value, it uses the Sale_Date column as a tiebreaker for the sequencing and makes them different from each other. Hence, they are assigned to different values in different partitions.

QUALIFY to Find Products in the top Partitions

Like the aggregate functions, OLAP functions must read all required rows before performing their operation. Therefore, the WHERE clause cannot be used. Where the aggregates use HAVING, the OLAP functions uses QUALIFY. The QUALIFY evaluates the result to determine which ones to return.

The following SELECT uses a QUALIFY to show only the products that sell in the top 60 Percentile:


8 Rows Returned

QUALIFY to Find Products in the top Partitions

Although ascending is the default sequence for both the QUANTILE and data value, many people think of the percentile with the highest number being best and therefore wish to see it listed first.

The following modifies the previous SELECT to incorporate the ORDER BY to obtain a different sequence in the answer set:


8 Rows Returned

previous SELECT to incorporate the ORDER BY to obtain a different sequence in the answer set

The ORDER BY changes the sequence of the rows being listed, not the meaning of the percentile. The above functions both determined that the highest number in the column is the highest percentile. The data value sequence ascends as the percentile ascends or descends as the percentile descends.

When the sort in the QUANTILE function is changed to ASC the data value sequence changes to ascend as the percentile descends. In other words, the sequence of the percentile does not change, but the data value sequence is changed to ascend (ASC) instead of the default, which is to descend (DESC).

The following SELECT uses the ASC to cause the data values to go contradictory to the percentile:


8 Rows Returned

SELECT uses the ASC to cause the data values to go contradictory to the percentileSELECT uses the ASC to cause the data values to go contradictory to the percentile

The next SELECT modifies the above query to incorporate the Product ID as a tiebreaker for the two rows with sales of $32,800.50:


8 Rows Returned

above query to incorporate the Product ID

Although the previous queries were all based on percentiles (100) other values can be used with interesting results.

The following SELECT uses a quartile (QUANTILE based on 4 partitions):


21 Rows Returned

quartile (QUANTILE based on 4 partitions)
quartile (QUANTILE based on 4 partitions)

In the above output, there were 21 rows. The highest RANK is 21, the lowest is 1 and everything seems correct. Although it is correct, notice the fact that product 1000 on October 2 and product 2000 on October 4 each had sales of $32,800.50. Therefore, they both rank the same as 16 with 17 not seen in the output. The two values tied for the 16th position.



Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Teradata Topics