When working with T-SQL, it’s often desirable to answer various questions at different levels of granularity. Although it’s interesting to know the price of each line item in an order, that information might be more valuable in the form of a total for the entire order. And at the end of the quarter, the sales team might want to know the total for all orders placed during the previous 3 months; or the average total order price; or the total number of visitors to the web site who made a purchase, divided by the total number of visitors, to calculate the percentage of visitors who bought something.
Each of these questions can be easily answered using T-SQL aggregate functions such as SUM, AVG, and COUNT. But there are many questions that are much more difficult to answer with the builtin aggregates. For example, what is the median of the total order prices over the last 90 days? What is the average order price, disregarding the least and most expensive orders? These are examples of questions that are difficult to answer with T-SQL aggregates. For instance, the standard algorithm for finding a median involves sorting and counting the set of values, and then returning the value in the middle of the sorted set. Translated to SQL Server, this would most likely mean using a cursor, walking over the result set to find the count, and then backtracking to get the correct value. And while that is a workable solution for a single group, it is not easy to adapt to multiple groups in the same rowset. Imagine writing that cursor to find the median sales amount for every salesperson, split up by month, for the last year—not a pretty picture.
User-defined aggregates eliminate this problem by giving developers tools to create custom aggregation functions in the .NET language of their choice. These aggregate functions are built to be robust and extensible, with built-in consideration for parallelism and flags that control behavior such that the query optimizer can better integrate the aggregations into query plans. User-defined aggregates can provide powerful support for operations that were previously extremely difficult in SQL Server.
In this section, we’ll examine a CLR user-defined aggregate that calculates a “trimmed” mean— an average of a set of numbers minus the smallest and largest input values.
Adding a User-Defined Aggregate to a SQL Server Project
To add a user-defined aggregate to a preexisting SQL Server project in Visual Studio 2008, right-click the project name in Solution Explorer and select Add ➤Aggregate, as shown in Figure.
Adding a user-defined aggregate to a SQL Server project
Once the aggregate has been added to the project, Visual Studio will add template code. The following code is the result of adding an aggregate called TrimmedMean:using System;
Parts of a User-Defined Aggregate
Programming a user-defined aggregate is in many ways similar to programming user-defined types. Both aggregates and types are represented by classes or structures that are serializable. It is important to understand when dealing with aggregates that the intermediate result will be serialized and deserialized once per row of aggregated data. Therefore, it is imperative for performance that serialization and deserialization be as efficient as possible.
The SqlUserDefinedAggregate attribute, much like the SqlUserDefinedType attribute, functions primarily as a way for developers to control serialization behavior. However, the attribute also exposes parameters that can allow the query optimizer to choose better query plans depending on the data requirements of the aggregate. The parameters exposed by the attribute are as follows:
The life of an instance of an aggregate begins with a call to Init. Within this method, any private members should be initialized to the correct placeholder values for having processed no rows. There is no guarantee that any data will ever be passed into the aggregate just because Init was called. Care should be taken to ensure that this assumption is never coded into an aggregate. An instance of an aggregate can be reused multiple times for different groups within the result set, so Init should be coded to reset the entire state of the aggregate.
The Accumulate method takes a scalar value as input and appends that value, in the correct way, to the running aggregation. That scalar value is an instance of whatever type is being aggregated. Since these values are coming from the SQL Server engine, they are nullable, and since the method itself has no control over which values are passed in, it must always be coded to properly deal with nulls. Remember that even if the column for the input to the aggregation is defined as NOT NULL, a NULL can result from an OUTER JOIN or a change in project requirements.
In some cases, query plans can go parallel. This means that two or more operations can occur simultaneously—including aggregation. There is a chance that some aggregation for a given aggregate of a given column will take place in one thread, while the rest will take place in other threads. The Merge method takes an instance of the aggregate as input and must append any intermediate data it contains into its own instance’s member data.
The final call in an aggregate’s life is Terminate. This method returns the end result of the aggregation.
Programming the TrimmedMean Aggregate
T-SQL has long included the AVG aggregate for calculating the mean value of a set of inputs. This is generally quite useful, but for statistical purposes, it’s often desirable to eliminate the greatest and least values from a mean calculation. Unfortunately, doing this in pure T-SQL is quite difficult, especially if the query also includes other aggregates that should not exclude the rows with the greatest and least amounts. This is a classic problem, and it’s the kind that CLR user-defined aggregates excel at solving.
To calculate the mean value excluding the maximum and minimum values, the aggregate must keep track of four values:
The final output value can be calculated by subtracting the minimum and maximum values from the running sum, and then dividing that number by the count, minus 2 (to account for the subtracted values). The following private member variables will be used to keep track of these values:private int numValues;
The Init method will prepopulate each of these variables with the appropriate values. numValues and totalValue will both be initialized to 0, starting the count. minValue will be initialized to SqlMoney.MaxValue, and maxValue to SqlMoney.MinValue. This will ease development of comparison logic for the initial values entered into the aggregate. Note that the SqlMoney type is used for this example to facilitate taking averages of order data in the AdventureWorks database. Other applications of such an aggregate may require different types. The following code is the implementation of Init for this aggregate:public void Init()
So that the aggregate behaves similarly to intrinsic SQL Server aggregates like AVG, it’s important that it ignore nulls. Therefore, the Accumulate method should increment the numValues variable only if the input is non-null. The following code implements Accumulate for this aggregate:public void Accumulate(SqlMoney Value)
Implementing Merge is very similar to implementing Accumulate, except that the value comes from another instance of the aggregate instead of being passed in from the query engine.public void Merge(TrimmedMean Group)
The final step in coding the aggregate is to define the Terminate method. Since the lowest and highest input values will be ignored, the output will be null if numValues is less than 3; it is impossible to ignore values that don’t exist! Aside from that, the algorithm employed is as described previously: divide the total value by the number of values after subtracting the minimum and maximum.public SqlDecimal Terminate()
Since the aggregate uses only value types as member variables, native serialization will suffice, and the default SqlUserDefinedAggregate attribute will not need to be modified. The complete code for the aggregate follows:using System;
Using the TrimmedMean Aggregate
Once deployed to the database, user-defined aggregate functions can be used just like built-in aggregates.For instance, to compare the results returned by the T-SQL AVG function to those returned by TrimmedMean for the total order amounts in the AdventureWorks database, you can use the following query:SELECT
The results of this query show a slightly lower average for the trimmed figure: $4,464.88 instead of $4,471.28 for the normal average.
Managing User-Defined Aggregates
If an assembly has been loaded into the database using CREATE ASSEMBLY, aggregates can be created or dropped without using the Visual Studio deployment task.
To create an aggregate that is exposed in an assembly, use CREATE AGGREGATE and specify the name of the assembly and the name of the structure or class that defines the aggregate. The following code creates the TrimmedMean aggregate from an assembly called Aggregates:CREATE AGGREGATE TrimmedMean
To drop an aggregate, use DROP AGGREGATE. The following code drops the TrimmedMean aggregate:
DROP AGGREGATE TrimmedMean
There is no catalog view dedicated to aggregates, but some data is exposed in the sys.objects view. To get information about user-defined aggregates, filter the type column for the value AF.SELECT *
Multiple-Input User-Defined Aggregates
Up until SQL Server 2008, user-defined aggregates could allow only one input parameter, such as AVG(TotalDue) and dbo.TrimmedMean(TotalDue) in the previous examples. Now it is possible to passmore than one parameter through your user-defined aggregate.
To demonstrate, we will work through an example that uses a table called MyValue, defined formally as follows:CREATE TABLE MyValue
We want to create an aggregate that will take the values of both columns and multiply them for each row. It will then perform a sum of the result of these products (products with regard to the mathematics definition—not something people buy). Our aggregate would ideally work as follows:
The code for this simple aggregate is as follows:using System;
Although this example is simple and the same effect could be achieved using other programmingconcepts, the important point to note here is the ease of passing multiple inputs into your aggregate function. Now let’s create our assembly and aggregate in SQL, as follows:CREATE ASSEMBLY MyAggregates
Now we can issue SELECT dbo.SUM_ALL(v1,v2) FROM MyValue and obtain our expected value.55.0000
SQL Server 2008 Related Interview Questions
|SQL Server 2000 Interview Questions||MSBI Interview Questions|
|SQL Server 2008 Interview Questions||SQL Server 2005 Interview Questions|
|SSIS(SQL Server Integration Services) Interview Questions||SSRS(SQL Server Reporting Services) Interview Questions|
|Microsoft Entity Framework Interview Questions||LINQ Interview Questions|
|SQL Server Analysis Services (SSAS) Interview Questions||Sql Server Dba Interview Questions|
SQL Server 2008 Related Practice Tests
|SQL Server 2000 Practice Tests||MSBI Practice Tests|
|SQL Server 2008 Practice Tests||SQL Server 2005 Practice Tests|
|SSIS(SQL Server Integration Services) Practice Tests||SSRS(SQL Server Reporting Services) Practice Tests|
|Microsoft Entity Framework Practice Tests||LINQ Practice Tests|
Sql Server 2008 Tutorial
Sql Server 2008 Overview
Sql Server Installation And Configuration
Sql Server Encryption
Automation And Monitoring
Integrated Full-text Search
New Datatypes In Sql Server 2008
T-sql Enhancements For Developers
T-sql Enhancements For Dbas
Sql Server And Xml
Sql Server Xml And Xquery Support
Linq To Sql
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.