Using Aggregate Functions - T-SQL

Aggregate functions fall into two categories: simple aggregates and statistical functions. The simplest technique for using aggregate functions is aggregating all rows in a query. Aggregate functions include the means to summarize a range of values in a variety of ways. For example, you may simply want to count the rows that match certain criteria or get the sum for a range of numeric values. The following table contains all the system-supplied aggregate functions supported by T-SQL used to summarize column values.

Using Aggregate Functions

An aggregate function is called and applied to a specific column. The aggregate function is applied to the range of values in that column, either for the entire table or a group of rows when used along with a GROUP BY clause. Most aggregates work on numeric values, but some, such as the COUNT, MIN, and MAX functions, can work with other data types. The following sections break down each aggregate function, with examples and instructions for you to try it out for yourself.

The COUNT() Function

When you just need to know how many records there are in a table, or how many records share common attribute values, use the COUNT() function. It simply counts rows or non-null values in a column's value range. Because the data type of the column isn't considered, it will work with columns of practically any type of data.

Consider the following two examples. If you execute this query against the Product table, the total number of rows is returned:

SELECT COUNT(*) FROM Production.Product

As you can see, the Product table contains 504 rows (unless you've added or removed any rows). Now, count only the values in the Color column using the following expression:

SELECT COUNT(Color) FROM Production.Product

Because 248 records don't have a Color value (these rows contain the value NULL for this column), only 256 rows get counted. Now add the word DISTINCT before the column reference and execute the query again:

SELECT COUNT(DISTINCT Color) FROM Production.Product

Because so many of the products have the same prices, only nine records are counted. The DISTINCT modifier can be used with any of the aggregate functions except when using the CUBE or ROLLUP statements, which are discussed later in this chapter.

The SUM() Function

The SUM() function simply returns the sum of a range of numeric column values. Like the others, this function only considers non-null values. A simple example returns the subtotal for a product order. This query adds up the UnitPrice for each detail line in the order whose SalesOrderID is 50189:

SELECT SUM(UnitPrice) FROM Sales.SalesOrderDetail WHERE SalesOrderID = 50189

The result is a single row with a single column just like the previous examples, as shown in Figure (Show In Figure Below). This is the sum of all UnitPrice values in the SalesOrderDetail table.

The SUM() Function

have two issues with this result. The first is that the column doesn't have a name. When applying aggregate functions, the resulting column won't be named unless you specifically define an alias for the column name. If you use visual query design tools, such as Access or the T-SQL Designer (in Visual Studio or to create a view in SQL Server Management Studio), these tools will devise column aliases such as SumOfUnitPrice or Expr1. The first order of business is to assign an alias so that this column has a sensible name. The other problem with this simple example is that it assumes that the customer purchased one of each product. The fact is that there are three detail rows for this order with respective quantities 1, 3, and 4. To total the order accurately, you'll have to do a little math. This query resolves both of these issues, calculating extended price and defining an alias for the column:

SELECT SUM(UnitPrice * OrderQty) AS OrderTotalPrice FROM Sales.SalesOrderDetail WHERE SalesOrderID = 50189

The result shown in Figure (Show In Figure Below) contains the correct amount (the total of all three order detail rows, considering the quantity for the product purchased), and the column has a name.


The AVG() Function

The AVG() function returns the calculated average for a range of numeric values. Internally, the query processor calculates the sum of all the values and then divides by the number of rows in the range (containing non-null values). Optionally, the AVG() function can make a distinct selection of values and then perform the same calculation on this abbreviated set of values. Using a distinct selection can greatly affect the results and is not as common.

I'd like to use the product sales data in the AdventureWorks2008 database to demonstrate the practical application of these aggregate functions. In this scenario, the director of marketing has asked for an analysis of road bike sales in 2003. This information exists in three related tables. Pay no attention to the join statements for the time being; they are covered in Chapter 8. The following query uses the SalesOrderHeader table to filter the sales order, the Product table to filter by ProductSubCategoryID (2 is road bikes), and the UnitPrice is retrieved from the SalesOrderDetail table. For simplicity, I'm not considering the quantity of bikes purchased.

I'll start with the lowest price paid for a bike. Using the MIN() function should return only one value:

You can see that the lowest UnitPrice value in this range is $296.99. (I've rounded the results to the nearest penny.) Just modify the query, substituting the following functions in place of the MIN() function in the example. The following table shows the results.


All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

T-SQL Topics