Notable DML Features - SQL Server 2008

DML in SQL Server continues to evolve throughout each release. The following features are available to make querying against SQL Server easier for the user:

  • Old-style outer joins deprecated:"=*" in a WHERE clause will raise an error.
  • Common table expressions: This ANSI 99 feature allows for recursion and code simplification.
  • TOP: This operator allows you to do parameterized row counts, and works with more DML than just SELECT.
  • FROM clause extensions: Join types improve usability of derived tables and table-based functions.
  • OUTPUT clause: Use this clause to get information about data that has changed.
  • Ranking functions: Find positional information within a result set through these functions.
  • EXCEPT and INTERSECT: These set operators provide the ability to compare sets of like data.
  • Synonyms: These give you the ability to provide alternate names for database objects.
  • MERGE: This allows users to express multiple DML actions (such as inserts and updates) in one statement

Each of these changes (especially the common table expressions) will make the base DML statements (INSERT, UPDATE, DELETE, and SELECT) far more powerful and functional.

Old-Style Outer Joins Deprecated
This one should probably be in the title of the book, as it is going to be the most painful of all changes. Microsoft has said over and over, and book authors and experts (not always the same group, mind you) have said over and over: stop using non-ANSI-style joins; they will not be supported in a future version. Well, the future is now. Consider the following query:

USE AdventureWorks --this is the default unless otherwise mentioned
GO
SELECT *
FROM Sales.salesPerson AS salesPerson,
Sales.salesTerritory AS salesTerritory
WHERE salesPerson.territoryId *= salesTerritory.territoryId

Trying this query will give you the following (really long and very descriptive) error message that says it as good as we could:

Msg 4147, Level 15, State 1, Line 5
The query uses non-ANSI outer join operators ("*=" or "=*").

To run this query without modification, please set the compatibility level for current database to 80 or lower, using stored procedure sp_dbcmptlevel. It is strongly recommended to rewrite the query using ANSI outer join operators (LEFT OUTER JOIN, RIGHT OUTER JOIN). In the future versions of SQL Server, non-ANSI join operators will not be supported even in backward-compatibility modes.

So what does this mean? It means that you need to rewrite your queries using ANSI join operators:

SELECT *
FROM Sales.salesPerson AS salesPerson
LEFT OUTER JOIN sales.salesTerritory AS salesTerritory
ON salesPerson.territoryId = salesTerritory.territoryId

Note that this restriction is only for outer joins, not for inner joins. The following works just fine and likely always will:

SELECT *
FROM Sales.salesPerson AS salesPerson,
Sales.salesTerritory AS salesTerritory
WHERE salesPerson.territoryId = salesTerritory.territoryId

Clearly, you should code this using an INNER JOIN, but the preceding will work because this syntax is required for correlated subqueries.

Common Table Expressions

The primary purpose of common table expressions (CTEs) is code simplification. They are loosely analogous to views in that they are defined as a single SELECT statement.

Once defined, CTEs are used exactly like views. To illustrate, we’ll take a look at a very simple example. The following query defines a CTE named simpleExample that contains a single column and row. Then a simple SELECT statement is issued to return the data defined by the CTE.

WITH simpleExample AS
(
SELECT 'hi' AS columnName
)
SELECT columnName
FROM simpleExample

which returns the following:

columnName
----------
hi

What makes the CTE significantly different from a view is that the CTE is not created as an object in the database and therefore is only available for this single statement. So in this sense, it will be treated by the compiler more or less as if you had coded a derived table (a named table expression that exists only for the duration of a query) as follows:

SELECT columnName
FROM (SELECT 'hi' AS columnName) AS simpleExample

This is an obviously simple example, but it serves to illustrate the basics of CTEs. If you needed to reference a given derived table in your query multiple times, this method would be of great help, since instead of recoding the query over and over, you would simply reference the CTE name. If the derived table were very large, it would greatly simplify the final query, so debugging the final result will be much easier.

The performance of the CTE should be on par with using derived tables. If you have queries where you use multiple derived queries, such queries will be evaluated multiple times. The same would be true for CTEs. In some cases, it will be better to use a temporary table to store the results of the query that you will use in the CTE, especially for complex CTEs that are used multiple times in a query.

There are two common uses for CTEs:

  • Simplify complex queries: To encapsulate complex code in a way to make code cleaner
  • Create recursive queries: To implement hierarchies traversing code in a single query (something not possible previously)

Simplifying Complex Queries

Consider the following scenario: the client needs a query that has subqueries to calculate salesperson totals for the year to date, compare that value to entire sales for the company, and then compare the salespeople’s sales to their quota for the year.
First, consider how this would be done using the SQL Server 2000 syntax. Each subset could be implemented as a derived table in a single query.

--In SQL Server 2000
SELECT CAST(c.LastName + ', ' + c.FirstName AS varchar(30)) AS SalesPerson
,
--YEAR TO DATE SALES
(SELECT amount
FROM ( SELECT soh.SalesPersonID, sum(sod.LineTotal) as amount
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
WHERE soh.Status = 5 -- complete
and soh.OrderDate >= '20040101'
GROUP by soh.SalesPersonID) as YTDSalesPerson
WHERE YTDSalesPerson.salesPersonId = salesperson.SalesPersonID) AS YTDSales,
--PERCENT OF TOTAL
(SELECT amount
FROM (SELECT soh.SalesPersonID, sum(sod.LineTotal) AS amount
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
WHERE soh.Status = 5 -- complete
AND soh.OrderDate >= '20040101'
GROUP BY soh.SalesPersonID) AS YTDSalesPerson
WHERE YTDSalesPerson.salesPersonId = salesperson.SalesPersonID) /
(SELECT SUM(amount)
FROM (SELECT soh.SalesPersonID, sum(sod.LineTotal) AS amount
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
WHERE soh.Status = 5 -- complete
AND soh.OrderDate >= '20040101'
GROUP BY soh.SalesPersonID) AS YTDSalesPerson
) AS percentOfTotal,
--COMPARE TO QUOTA
(SELECT amount
FROM (SELECT soh.SalesPersonID, sum(sod.LineTotal) AS amount
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
WHERE soh.Status = 5 -- complete
AND soh.OrderDate >= '20040101'
GROUP BY soh.SalesPersonID)AS YTDSalesPerson
WHERE YTDSalesPerson.salesPersonId = salesperson.SalesPersonID) -
salesPerson.SalesQuota AS MetQuota
FROM Sales.SalesPerson AS salesPerson
JOIN HumanResources.Employee AS e
ON salesPerson.salesPersonId = e.employeeId
JOIN Person.Contact as c
ON c.contactId = e.contactId

Of real interest here are the bold parts of the code. It is the same subquery over and over. This is a beast of a query to follow, and not overly pleasant to write. You may also be thinking that each of the derived tables could be implemented as a view, but if this is the only situation where they will be used, the management overhead of implementing three views in a production environment would not be worth the effort. You might also use a temp table, which is what we generally end up doing when we come upon this sort of situation. This is not necessarily the best way to implement something so complex in SQL Server 2000, but it was the only way to do it in one statement (a goal for writing powerful SQL, but not always readable or maintainable code).

Now let’s reformulate this query using the new CTE syntax. As demonstrated in the following query, you define two CTEs to replace the derived table and the main table to produce the exact same results, with semantically the exact same query as in the previous example, only clearer to understand.

-- SQL Server CTE syntax
WITH YTDSalesPerson
AS
(
SELECT soh.SalesPersonID, SUM(sod.LineTotal) AS amount
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
WHERE soh.Status = 5 -- complete
AND soh.OrderDate >= '20040101'
GROUP BY soh.SalesPersonID
),
SalesPersonInfo
AS
(
SELECT salesPersonId, SalesQuota AS salesQuota,
CAST(c.LastName + ', ' + c.FirstName AS varchar(30)) AS SalesPerson
FROM Sales.SalesPerson AS s
JOIN HumanResources.Employee AS e
ON s.salesPersonId = e.employeeId
JOIN Person.Contact AS c
ON c.contactId = e.contactId
)
SELECT SalesPersonInfo.SalesPerson,
(SELECT amount
FROM YTDSalesPerson
WHERE YTDSalesPerson.salesPersonId = salespersonInfo.SalesPersonID)
AS YTDSales,
(SELECT amount
FROM YTDSalesPerson
WHERE YTDSalesPerson.salesPersonId = salespersonInfo.SalesPersonID)
/ (SELECT SUM(amount) FROM YTDSalesPerson) AS percentOfTotal,
(SELECT amount
FROM YTDSalesPerson
WHERE YTDSalesPerson.salesPersonId = salespersonInfo.SalesPersonID) -
salesPersonInfo.SalesQuota AS MetQuota
FROM SalesPersonInfo

While this is still no beginner query, it is much easier to read and understand than the previous version. Not to mention that you can now test the CTE code by simply executing the following:

WITH YTDSalesPerson
AS
(
SELECT soh.SalesPersonID, SUM(sod.LineTotal) AS amount
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
WHERE soh.Status = 5 -- complete
AND soh.OrderDate >= '20040101'
GROUP BY soh.SalesPersonID
)
SELECT *
FROM YTDSalesPerson

You can use this instead of having to deal with pulling out large sections of code from your original query. And say you make a correction or change to the calculation. It changes once, and all usages change. All this without having to persist any data or objects. SQL Server does the real work! Consider also that the “black boxed” CTE works independently of the main query, so after testing it, you won’t need to worry if this part of the query works as you debug the larger query.

Using CTEs for Recursive Queries

The second use of CTEs allows for recursive hierarchies to be navigated without the need for complex recursion logic done manually. Hierarchies are common in real-world situations, such as employees and managers (managers are employees), or in manufacturing with parts within other parts (an automobile engine is a part, but it also consists of a lot of other parts inside).

Using T-SQL, it was previously complicated to build queries that dealt with hierarchies. If the number of levels in a hierarchy was known beforehand, it was possible to use self joins, but even that approach was very cumbersome. If you had unlimited numbers of levels, it was practically impossible. CTEs can make hierarchies manageable.

BASIC TREE HANDLING IN T-SQL

BASIC TREE HANDLING IN T-SQL
Take a look at an example that uses the Employee table in the AdventureWorks database. This is a classic example of a single-parent hierarchy with the managerId column identifying all employees who report to a specified manager. Let’s look at a typical breadthwise hierarchical query used in SQL Server 2000 to retrieve all employees who reported to the manager whose managerId was 140.

-- SQL Server 2000 example
DECLARE @managerId int
SET @managerId = 140
--holds the output tree level, which lets us isolate a level in the looped query
DECLARE @outTable table (employeeId int, managerId int, treeLevel int)
--used to hold the level of the tree we are currently at in the loop
DECLARE @treeLevel as int
SET @treelevel = 1
--get the top level
INSERT INTO @outTable
SELECT employeeId, managerId, @treelevel as treelevel
FROM HumanResources.employee AS Employee
WHERE (Employee.managerId = @managerId)
WHILE (1 = 1) --imitates do...until construct
BEGIN
INSERT INTO @outTable
SELECT Employee.employeeId, Employee.managerId,
treelevel + 1 AS treelevel
FROM HumanResources.employee AS Employee
JOIN @outTable AS ht
ON Employee.managerId = ht.employeeId
--this where isolates a given level of the tree
WHERE EXISTS( SELECT *
FROM @outTable AS holdTree
WHERE treelevel = @treelevel
AND Employee.managerId = holdtree.employeeId)
IF @@rowcount = 0
BEGIN
BREAK
END
SET @treelevel = @treelevel + 1
END
SELECT Employee.employeeid,Contact.lastname, Contact.firstname
FROM HumanResources.Employee AS Employee
INNER JOIN @outTable ot
ON Employee.employeeid = ot.employeeid
INNER JOIN Person.Contact AS Contact
ON Contact.contactId = Employee.contactId

Using CTEs, however, gives a much cleaner implementation:

-- SQL Server 2008 syntax
DECLARE @managerId int
SET @managerId = 140;
WITH EmployeeHierarchy (employeeid, managerid)
AS
(
SELECT employeeid, managerid
FROM HumanResources.Employee AS Employee
WHERE ManagerID=@managerId
UNION ALL
SELECT Employee.employeeid, Employee.managerid
FROM HumanResources.Employee AS Employee
INNER JOIN EmployeeHierarchy
ON Employee.managerid= EmployeeHierarchy.employeeid)
SELECT Employee.employeeid,Contact.lastname,Contact.firstname
FROM HumanResources.Employee AS Employee
INNER JOIN EmployeeHierarchy
ON Employee.employeeid = EmployeeHierarchy.employeeid
INNER JOIN Person.Contact AS Contact
ON Contact.contactId = Employee.contactId

So let’s take this query apart and look at how it works. First, we define the name of the CTE and define the names of the columns that will be output:

WITH EmployeeHierarchy (employeeid, managerid)
AS
(

The first query gets the top level in the hierarchy. It is not required that it return only a single row, but in our case, we are getting all rows where the managerId= 140; that is, all people who work for managerId = 140. We are using a variable here to keep it equivalent to the SQL Server 2000 example.

SELECT employeeid, managerid
FROM HumanResources.Employee AS Employee
WHERE managerid=@managerId

Then use UNION ALL to connect these rows to the next specified set:

UNION ALL

UNION ALL does not specify duplicates to be removed, so you will need to be careful to avoid cyclical relationships in your data. The problem comes when you might have a child row that may also be a predecessor to itself.

For example, consider the following table of data:

BASIC TREE HANDLING IN T-SQL

The manager of employee 1 is employee 3. Then when you get the manager for employee 3, it is employee 5, and then employee 5’s manager is 1. So when you get to the next row, you would end up in an infinite loop, because you would find that the manager of 1 is 3, and you would just keep going until you hit the nesting limit of SQL Server code (32).

So now we get to the cool part. We get the children of the rows we got in the first part of the query by the name of the CTE—in our case, EmployeeHierarchy. This is how the recursion works. It joins to itself, but again contains itself (hence, the recursive nature of a query calling itself ). It continues on until no employeeid is a manager of anyone else (down to where the real work gets done!).

SELECT Employee.employeeid, Employee.managerid
FROM HumanResources.Employee AS Employee
INNER JOIN EmployeeHierarchy
on Employee.managerid= EmployeeHierarchy.employeeid
)

Finally, we use the CTE. The loop happens without us even noticing from the outside by materializing the set of rows, then joining that to the full Employee table. The result is all persons who report to a given manager.

SELECT Employee.employeeid,Contact.lastname,Contact.firstname
FROM HumanResources.Employee AS Employee
INNER JOIN EmployeeHierarchy
ON Employee.employeeid = EmployeeHierarchy.employeeid
INNER JOIN Person.Contact AS Contact
ON Contact.contactId = Employee.contactId

Before we leave CTEs altogether, there is one more really neat thing that you can do with them. In the query that gets the top level of the hierarchy, you can introduce values into the CTE that you can manipulate throughout the iterations of the query. The following query extends the query to include a level indicator and a hierarchy of the employee’s managers, up to the top manager for the employee. In the recursive part of the CTE, you put code to add one for each level. In our case, we increment the tree level by one, and then we add the employeeid from the level to the hierarchy. We use varchar(max) since we don’t know just how big the hierarchy will be.

DECLARE @managerId int
SET @managerId = 140;
WITH EmployeeHierarchy(employeeid, managerid, treelevel, hierarchy)
AS
(
SELECT employeeid, managerid,
1 AS treeevel, CAST (employeeid AS varchar(max)) AS hierarchy
FROM HumanResources.Employee AS Employee
WHERE managerid=@managerId
UNION ALL
SELECT Employee.employeeid, Employee.managerid,
treelevel + 1 AS treelevel,
hierarchy + '' +CAST(Employee.employeeid AS varchar(20)) AS hierarchy
FROM HumanResources.Employee AS Employee
INNER JOIN EmployeeHierarchy
ON Employee.managerid= EmployeeHierarchy.employeeid
)
SELECT Employee.employeeid,Contact.lastname,Contact.firstname,
EmployeeHierarchy.treelevel, EmployeeHierarchy.hierarchy
FROM HumanResources.Employee AS Employee
INNER JOIN EmployeeHierarchy
ON Employee.employeeid = EmployeeHierarchy.employeeid
INNER JOIN Person.Contact AS Contact
ON Contact.contactId = Employee.contactId
ORDER BY hierarchy

Running this returns the following:

Running this returns
So now we have our hierarchy, it is sorted by manager, and we know how far away an employee is from the manager we passed in as a parameter.

CTEs will be helpful in cleaning up code for sure, but more important, recursive queries will be far easier to deal with than ever before.

TOP

The TOP operator is used to specify the number of rows returned by a query. In previous versions of SQL Server, TOP required a literal. So if you needed to parameterize the number of values to return when a statement was coded, the alternative was to use SET ROWCOUNT @<variableName> to restrict the rows returned. This, however, affected all following T-SQL statements. It was also an issue that you needed to reset ROWCOUNT to 0 after every statement you used it on, or big problems would occur when you started affecting only a couple of rows in subsequent statements.

TOP affects only a single statement, and now it allows a variable to set the number of rows affected. For example, you can return a variable set of rows based on a variable from the AdventureWorks Person.Contact table.

DECLARE @rowsToReturn int
SELECT @rowsToReturn = 10
SELECT TOP(@rowsToReturn) * --note that TOP requires parentheses to accept
--parameters but not for constant
FROM HumanResources.Employee

If the value of the TOP parameter is invalid, such as a null value or a negative value, SQL Server will return an error. In SQL Server, you may update views that are defined using the TOP operator. However, after you update the values that show up in the view, the rows may vanish from the view.

Beyond parameterization, another change to the TOP operator is that it will work with INSERT, UPDATE, and DELETE (again with parameters if desired).

For example, say you have a table and you want to insert the top five rows from a result set (in this case, one that just has seven values from a UNION statement). You can do this with INSERT TOP (N):

CREATE TABLE testTop
(
value int PRIMARY KEY
)
INSERT TOP (5) INTO testTop
SELECT * --this derived table returns seven rows
FROM (SELECT 1 AS value UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7) AS sevenRows
GO
SELECT *
FROM testTop
GO

The derived table contains seven rows, but the TOP operator on the INSERT statement means that this will return only five of them:

value
-----------
1
2
3
4
5

Now, you can use TOP on an UPDATE to change two rows only:

UPDATE TOP (2) testTop
SET value = value * 100
SELECT *
FROM testTop

Which returns the following:

value
-----------
3
4
5
100
200

Finally, using DELETE, you can remove three rows from the table:

DELETE TOP(3) testTop
GO
SELECT * FROM testTop

which results in this output:

value
-----------
100
200

Each of these techniques is basically only good when you are trying to batch-modify a set. So you can do something like this:

INSERT TOP (10) otherTable (batchTableId, value)
FROM batchTable
WHERE NOT EXISTS ( SELECT *
FROM otherTable
WHERE otherTable.batchTableId = batchTable.batchTableId)

It is not a technique that will be extremely useful, but it is a much more elegant solution than using SET ROWCOUNT, because it is clear to the query processor from which statement you intend to limit the number of rows. Any other statements in any subordinate objects like triggers do not need to be subject to the limitation, as was the case with ROWCOUNT.

This example has a pretty big “gotcha” that may not be completely obvious to you. Notice there was no mention of which rows would be modified or deleted. As cannot be stated too many times, tables are unordered sets, and modification statements have no ORDER BY clause to order the set they deal with. It will literally just INSERT, UPDATE, or DELETE whatever number of rows you state, purely at the will of the optimizer. (SQL Server Books Online goes so far as to state that the rows affected when TOP is used will be “random,” which, while technically true, is perhaps pushing it.) It stands to reason that whichever rows are easiest to modify, these rows will be the ones chosen. This is unlike how the SELECT statement returns the rows based on an ORDER BY clause if one is present, and it can be a bit confusing. Consider this example:

CREATE TABLE top10sales
(
salesOrderId int,
totalDue MONEY
)
INSERT TOP (10) top10sales
SELECT salesOrderId, totalDue
FROM Sales.salesOrderHeader
ORDER BY totalDue DESC

Ten rows from the Sales.SalesOrderHeader table will be returned and inserted, but they will not be the top ten highest values. For this, you can use the following:

INSERT top10sales
SELECT TOP (10) salesOrderId, totalDue
FROM Sales.salesOrderHeader
ORDER BY totalDue DESC

Now the values in the top10sales table are the top ten values in Sales.SalesOrderHeader.totalDue. The value of adding TOP to the INSERT, UPDATE, and DELETE statements is to facilitate batching operations. For example, you might want to delete a large number of rows from a table, a few at a time, to keep the transaction small, thereby assisting in maintaining concurrent use of the table, with a small number of locks. So you could execute the following:

BEGIN TRANSACTION
DECLARE @rowcount int
SET @rowcount = 100
WHILE (@rowcount = 100)-- if it is less than 100,we are done, greater than 100
BEGIN --cannot happen
DELETE TOP(100) sales.salesOrderHeader
SET @rowcount = @@rowcount
END
ROLLBACK TRANSACTION --we don't want to actually delete the rows

Note that using TOP is the preferred way of acting on only a fixed number of rows, rather than using SET ROWCOUNT <N>. SET ROWCOUNT is deprecated and will be removed in the next version of SQL Server.

So, the TOP operator accepts variables as well as literal values and can be used with INSERT, UPDATE, and DELETE statements. This is a good addition, but bear in mind that TOP is merely a filter that tells SQL Server that once it starts processing rows, it needs to return only a portion of the rows that the query would return. If the number is hard-coded, it can help in optimization of the query. Using a variable means that the optimizer will need to assume you want all rows. This could have performance implications.

Extensions to the FROM Clause

SQL Server allows the FROM clause to extend the way you can manipulate data with the SELECT, INSERT, UPDATE, and DELETE SQL statements. The FROM clause supports the following:

  • APPLY operator: CROSS APPLY and OUTER APPLY aid with use of user-defined functions.
  • Random data sampling: This enables you to return random sets of rows from a table.
  • Pivoting data: The PIVOT operator allows you to easily rotate the columns in a table to rows.

Join Types

One particular annoyance in early versions of SQL Server was that derived tables could not be correlated to other sets in the FROM clause. For example, the following type of query would not have been allowed:

SELECT Product.productNumber, SalesOrderAverage.averageTotal
FROM Production.Product AS Product
JOIN ( SELECT AVG(lineTotal) AS averageTotal
FROM Sales.SalesOrderDetail as SalesOrderDetail
WHERE Product.ProductID=SalesOrderDetail.ProductID
HAVING COUNT(*) > 1
) AS SalesOrderAverage

Of course, for the most part, a normal query like this was not a great problem because this query could easily be rewritten as an inner join. However, in SQL Server 2000, Microsoft added table-valued user-defined functions, for which rewriting as a join was not possible. The following example seems natural:

SELECT Product.productNumber, SalesOrderAverage.averageTotal
FROM Production.Product AS Product
JOIN dbo.getAverageTotalPerProduct(product.productId)
AS SalesOrderAverage

but this is also illegal. Instead, the function would need to be coded to return all rows for all products, which would probably not perform well at all, especially if you had a great many products.

This was one of the most frustrating problems with using table-based functions in SQL Server 2000. The APPLY operator allows queries of this style to make optimal use of derived tables and scalar functions. The operator injects column data from the left-hand table source into the right-hand source (valued function or derived table). There are two forms of APPLY:

  • CROSS APPLY: Returns only a rowset if the right table source returns data for the parameter values from the left table source columns.
  • OUTER APPLY: Just like an outer join, returns at least one row from the left table source, even if no rowset is returned from the right.

In the next block of code, we use the CROSS APPLY operator to execute the derived table once per row in the Production.Product table. The subquery gets only one row, which is the average amount for a sale of that product. If it hasn’t been sold yet, an empty result set would be the result of the CROSS APPLY operation, and the Production.Product row would not be returned:

SELECT Product.productNumber, SalesOrderAverage.averageTotal
FROM Production.Product AS Product
CROSS APPLY ( SELECT AVG(lineTotal) AS averageTotal
FROM Sales.SalesOrderDetail AS SalesOrderDetail
WHERE Product.ProductID=SalesOrderDetail.ProductID
HAVING COUNT(*) > 0
) AS SalesOrderAverage

This returns the following results (truncated for readability):

(truncated for readability)

The real power of CROSS APPLY is with user-defined functions. Instead of a derived table, let’s create a function that returns the value of the largest sale of a given product from the Sales.SalesOrderDetail table.

CREATE FUNCTION production.getAverageTotalPerProduct
(
@productId int
)
RETURNS @output TABLE (unitPrice DECIMAL(10,4))
AS
BEGIN
INSERT INTO @output (unitPrice)
SELECT AVG(lineTotal) AS averageTotal
FROM Sales.SalesOrderDetail AS SalesOrderDetail
WHERE SalesOrderDetail.ProductID = @productId
HAVING COUNT(*) > 0
RETURN
END

Now the statement can be rewritten as follows:

SELECT Product.ProductNumber, AverageSale.UnitPrice
FROM Production.Product AS Product
CROSS APPLY
Production.getAverageTotalPerProduct(product.productId)
AS AverageSale

This returns the same results as previously. Obviously, your multistatement table-valued functions would usually be a lot more complex than this (otherwise, why bother to make it a function?), but for the purposes of this example, this allows us to make use of the function in a very nice format that is usable, and should have good performance. (Clearly what performs well enough varies on a case-by-case basis. Don’t take anyone’s word for it—test, test, and test some more.)

In the previous example, any product rows where there were no matching unit sales were not included in the output. We change this to an OUTER APPLY to get back all rows in the product table, and NULLs for the unit price column where there have been no sales:

SELECT Product.ProductNumber, AverageSale.UnitPrice
FROM Production.Product AS Product
OUTER APPLY
Production.getAverageTotalPerProduct(product.productId)
AS AverageSale

Products that don’t have any associated sales will return no rows, but you still get a row for each product:

associated sales will return no rows
Random Data Sampling

Sometimes it is desirable to get an idea about the distribution of data in a table, but it is impractical because there is just too much data. Obviously, it is seldom necessary to look at every piece of data to get an idea of how the data looks in the table. Take elections, for an example. An exit poll needs to sample only a small proportion of the population in order to predict the outcome with a high degree of accuracy.

TABLESAMPLE lets you get a random set of rows. You can specify either a percentage of the table or a given number of rows to return. However, what you will likely find weird about this operator is that it seldom returns exactly the same number of rows.

In the next example, the query returns a different selection of rows each time it is executed. What seems odd is just how large the variance in the row count will be. For this query, it typically will be between 1,800 and 3,000, which is approximately 2 percent of the number of rows in the Sales.SalesOrderDetail table, as there are 118,990 in the version of the table we are using and 2 percent is 2,379.

SELECT *
FROM Sales.SalesOrderDetail TABLESAMPLE SYSTEM (2 PERCENT)

There is also an option to get a more exact count of rows:

SELECT *
FROM Sales.SalesOrderDetail TABLESAMPLE SYSTEM (500 rows)

Again, however, it will not actually return only 500 rows; in testing, the count has been between 200 and 800. If you want to get back the same results each time you run one of these queries, you can specify the REPEATABLE option and a fixed seed for the randomizer:

SELECT *
FROM Sales.SalesOrderDetail TABLESAMPLE SYSTEM (500 rows) REPEATABLE (123456)

Given the same seed, you will get the same rows back. One thing to note here: this is not like the repeatable read isolation level. If another user makes changes to the data in the table, you will not get back the exact same rows. It is only true for a given “version” of the table.

As you’ve seen, these commands will not return the exact same number of rows each time, even when you specify a certain number of rows! This behavior has to do with the random nature of the TABLESAMPLE operator, and that the samples are done in a single pass through the table. It should not hurt your queries, but if you need an exact number of rows for a test, you could INTERSECT two table samples (INTERSECT is new and covered later in this chapter in the “EXCEPT and INTERSECT” section) and use TOP to get your set.

Pivoting Data

Excel has had pivot tables for quite some time, allowing users to rotate a table in such a way as to turn rows into columns, and back again. In SQL Server, two relational operators give some of the same functionality to T-SQL. These operators are PIVOT and UNPIVOT.

PIVOT

One thing that was almost impossible to do in T-SQL was to take a set of rows and pivot them to columns. The PIVOT operator allows you to rotate the columns in a table to rows.

As a very simple example (as simple as it can be anyhow), consider the following table of data, which we will call SalesByMonth. (We put it in the Sales schema; if you are not building this in the AdventureWorks database, you may need to create the schema using the command CREATE SCHEMA
Sales.)

CREATE TABLE Sales.SalesByMonth
(
year char(4),
month char(3),
amount MONEY,
PRIMARY KEY (year, month)
)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Jan', 789.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Feb', 389.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Mar', 8867.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Apr', 778.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','May', 78.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Jun', 9.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Jul', 987.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Aug', 866.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Sep', 7787.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Oct', 85576.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Nov', 855.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2007','Dec', 5878.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2008','Jan', 7.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2008','Feb', 6868.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2008','Mar', 688.0000)
INSERT INTO Sales.SalesByMonth (year, month, amount)
VALUES('2008','Apr', 9897.0000)

Most likely, this would not be representative of a table in your database, but instead the final output of a query that summarized the data by month. While this is the natural format for a set-based query, it is notlikely that the user will want to look at the data this way; for starters, it is ordered strangely (alphabetic order because of the clustered index on the key!), but even if it were ordered by month, it would still be pretty ugly. It is likely the desired output of this data might be like this:

desired output of this data might be

Using SQL Server 2000 coding techniques, the query would look something along these lines:

SELECT year,
SUM(case when month = 'Jan' then amount else 0 end) AS 'Jan',
SUM(case when month = 'Feb' then amount else 0 end) AS 'Feb',
SUM(case when month = 'Mar' then amount else 0 end) AS 'Mar',
SUM(case when month = 'Apr' then amount else 0 end) AS 'Apr',
SUM(case when month = 'May' then amount else 0 end) AS 'May',
SUM(case when month = 'Jun' then amount else 0 end) AS 'Jun',
SUM(case when month = 'Jul' then amount else 0 end) AS 'Jul',
SUM(case when month = 'Aug' then amount else 0 end) AS 'Aug',
SUM(case when month = 'Sep' then amount else 0 end) AS 'Sep',
SUM(case when month = 'Oct' then amount else 0 end) AS 'Oct',
SUM(case when month = 'Nov' then amount else 0 end) AS 'Nov',
SUM(case when month = 'Dec' then amount else 0 end) AS 'Dec'
FROM Sales.SalesByMonth
GROUP BY year

Not terribly hard to follow, but pretty messy if you start to need more information. Using the new PIVOT operator, the code is changed to this:

SELECT Year, [Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec]
FROM (
SELECT year, amount, month
FROM Sales.SalesByMonth ) AS SalesByMonth
PIVOT ( SUM(amount) FOR month IN
([Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])
) AS ourPivot
ORDER BY Year

The most important part of this query is this:

PIVOT ( SUM(amount) FOR month IN
([Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])

It produces a value for each of the columns in the IN clause that matches values in the month column. This is done via the SUM (amount) FOR month section. The moderately confusing part is that it groups on the columns that are not a part of the PIVOT statement. Since year was not in an aggregate, it grouped the pivot on year. Suppose we remove year from the query as follows:

SELECT [Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec]
FROM ( SELECT amount, month
FROM Sales.SalesByMonth ) AS SalesByMonth
PIVOT ( SUM(amount) FOR month IN
([Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])
) AS ourPivot

The result is that it groups on all rows:

result is that it groups on all rows

When you need to store variable attributes in your schema and you cannot determine all of the data requirements at design time, PIVOT is excellent. For example, a store that has many products may have very different attributes for each product. Instead of having different tables with different attributes for each type of product (a management nightmare), you implement a table that allows for varying properties to be created and a value associated with them. Building these “property” or “attribute” tables is an easy technique for associating values with a product, but writing queries to deliver to the client can be very cumbersome.

As an example, let’s extend the Person.Contact table by creating a Person.ContactProperty table that contains properties that you might not know when the schema is designed. We will store the person’s dog’s name, hair color, and car style:

CREATE TABLE Person.ContactProperty
(
ContactPropertyId int identity(1,1) PRIMARY KEY,
ContactId int REFERENCES Person.Contact(contactid),
PropertyName varchar(20),
PropertyValue SQL_VARIANT,
UNIQUE (contactid, propertyname)
)
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(1,'dog name','Fido')
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(1,'hair color','brown')
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(1,'car style','sedan')
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(2,'dog name','Rufus')
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(2,'hair color','blonde')
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(3,'dog name','Einstein')
INSERT Person.ContactProperty (contactid, propertyname, propertyvalue)
VALUES(3,'car style','coupe')
If we look at the data in a basic T-SQL statement:
SELECT CAST(Contact.firstname + ' ' + Contact.lastname AS varchar(30)) AS Name,
ContactProperty.propertyname, ContactProperty.propertyvalue
FROM Person.Contact AS Contact
INNER JOIN Person.ContactProperty AS ContactProperty
ON ContactProperty.contactid = Contact.contactid

then we get the typical result set of a row per attribute:
typical result set of a row per attribute

It will usually be more desirable to display this data as a single row in the user interface or report. To do this currently, you would need to navigate the results and pivot the rows themselves. Not that this is always the wrong way to go. Data formatting is usually best done using the presentation layer, but in some cases, doing this in the data layer will be best; depending on what kinds of clients are being used, or if you need to use the data in a different SQL query, data in this row-perattribute format may be troublesome to deal with.

Instead, we can use the PIVOT operator to take the propertyname values and rotate the data to provide a single row per contact. We use a derived table, selecting all of the columns required in the PIVOT from our ContactProperty table. We then pivot this, getting the attribute values by propertyname for each contactid.

SELECT CAST(Contact.firstname + ' ' + Contact.lastname AS varchar(30)) AS name,
pivotColumns.* --demonstrating that * works, it should not
--be done this way in production code
FROM Person.Contact AS Contact
INNER JOIN (SELECT contactid, propertyname,propertyvalue
FROM Person.ContactProperty AS property)
as PivotTable
PIVOT( MAX(propertyvalue)
FOR propertyname IN ([dog name],[hair color],[car style]))
AS PivotColumns
ON Contact.contactid = PivotColumns.contactid

We are getting the maximum value of the propertyvalue column for the rotated value, since only one value can be non-null. We are not restricted to only one unique value for the rotation, and this allows all sorts of aggregation to take place, making the PIVOT operator incredibly powerful. Then we specify the column that contains the values that we will pivot around. In our example, we are using the propertyname column, specifying the values that will become the columns in our output. The query output is as follows (noting that there is a null value where there was no value for a given property):

query output is as follows

Note that we just used the first three contactid values from the Person.Contact table for this example.

UNPIVOT

The (almost) opposite effect of PIVOT is, not surprisingly, UNPIVOT. Less useful on a day-to-day-basis, this is a fantastic tool to have when doing data conversions or dealing with poorly designed databases. Going back to the set of data we created with our SalesByMonth query, using the INTO clause, we create a table of data called dbo.SalesByYear:

SELECT Year,[Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec]
INTO Sales.SalesByYear
FROM (
SELECT year, amount, month
FROM Sales.SalesByMonth ) AS SalesByMonth
PIVOT ( SUM(amount) FOR month IN
([Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])
) AS ourPivot
ORDER BY Year

which contains the following output:

following output

Now, to get this back to the original format, we can use a query like the following:

SELECT Year, CAST(Month AS char(3)) AS Month, Amount
FROM Sales.SalesByYear
UNPIVOT (Amount FOR Month IN
([Jan],[Feb],[Mar],[Apr],[May],[Jun],
[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS unPivoted

which returns this output:

original format, we can use a query like the following

This is another really nice addition to what T-SQL can do natively and will come in handy in quite a few cases. One thing you should note, however: UNPIVOT is not exactly the opposite of PIVOT. Null values in the table will not be returned as rows in the UNPIVOTed output. So, if we had the following table:

Null values in the table will not be returned as rows in the UNPIVOTed output

we would not get rows in our output like the following:

rows in our output

The first two rows would not exist in the output.

OUTPUT

In SQL Server, an OUTPUT clause can be used as part of DML statements to assist in auditing changes made during the statement. The OUTPUT clause specifies a statement to output a result set of changes to a table variable.

Just as with triggers, you use the inserted and deleted tables to access the rows that have been modified in a statement and the data that is being deleted.

For example, change firstname in the Person.Contact table to the reverse of the original value. Note the BEGIN TRANSACTION and ROLLBACK TRANSACTION statements to avoid corrupting the data in the Person.Contact table.

BEGIN TRANSACTION
DECLARE @changes table (change varchar(2000))
UPDATE TOP (10) Person.Contact
SET firstname = Reverse(firstname)
OUTPUT 'Was: ''' + DELETED.firstname +
''' Is: ''' + INSERTED.firstname + ''''
INTO @changes
SELECT *
FROM @changes
ROLLBACK TRANSACTION
--note that local variable tables are not affected by transactions!

This returns the following (assuming you haven’t already made changes


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics