DML in SQL Server continues to evolve throughout each release. The following features are available to make querying against SQL Server easier for the user:
Each of these changes (especially the common table expressions) will make the base DML statements (INSERT, UPDATE, DELETE, and SELECT) far more powerful and functional.
Old-Style Outer Joins Deprecated
This one should probably be in the title of the book, as it is going to be the most painful of all changes. Microsoft has said over and over, and book authors and experts (not always the same group, mind you) have said over and over: stop using non-ANSI-style joins; they will not be supported in a future version. Well, the future is now. Consider the following query:
Trying this query will give you the following (really long and very descriptive) error message that says it as good as we could:Msg 4147, Level 15, State 1, Line 5
To run this query without modification, please set the compatibility level for current database to 80 or lower, using stored procedure sp_dbcmptlevel. It is strongly recommended to rewrite the query using ANSI outer join operators (LEFT OUTER JOIN, RIGHT OUTER JOIN). In the future versions of SQL Server, non-ANSI join operators will not be supported even in backward-compatibility modes.
So what does this mean? It means that you need to rewrite your queries using ANSI join operators:SELECT *
Note that this restriction is only for outer joins, not for inner joins. The following works just fine and likely always will:SELECT *
Clearly, you should code this using an INNER JOIN, but the preceding will work because this syntax is required for correlated subqueries.
Common Table Expressions
The primary purpose of common table expressions (CTEs) is code simplification. They are loosely analogous to views in that they are defined as a single SELECT statement.
Once defined, CTEs are used exactly like views. To illustrate, we’ll take a look at a very simple example. The following query defines a CTE named simpleExample that contains a single column and row. Then a simple SELECT statement is issued to return the data defined by the CTE.WITH simpleExample AS
which returns the following:columnName
What makes the CTE significantly different from a view is that the CTE is not created as an object in the database and therefore is only available for this single statement. So in this sense, it will be treated by the compiler more or less as if you had coded a derived table (a named table expression that exists only for the duration of a query) as follows:SELECT columnName
This is an obviously simple example, but it serves to illustrate the basics of CTEs. If you needed to reference a given derived table in your query multiple times, this method would be of great help, since instead of recoding the query over and over, you would simply reference the CTE name. If the derived table were very large, it would greatly simplify the final query, so debugging the final result will be much easier.
The performance of the CTE should be on par with using derived tables. If you have queries where you use multiple derived queries, such queries will be evaluated multiple times. The same would be true for CTEs. In some cases, it will be better to use a temporary table to store the results of the query that you will use in the CTE, especially for complex CTEs that are used multiple times in a query.
There are two common uses for CTEs:
Simplifying Complex Queries
Consider the following scenario: the client needs a query that has subqueries to calculate salesperson totals for the year to date, compare that value to entire sales for the company, and then compare the salespeople’s sales to their quota for the year.
First, consider how this would be done using the SQL Server 2000 syntax. Each subset could be implemented as a derived table in a single query.
Of real interest here are the bold parts of the code. It is the same subquery over and over. This is a beast of a query to follow, and not overly pleasant to write. You may also be thinking that each of the derived tables could be implemented as a view, but if this is the only situation where they will be used, the management overhead of implementing three views in a production environment would not be worth the effort. You might also use a temp table, which is what we generally end up doing when we come upon this sort of situation. This is not necessarily the best way to implement something so complex in SQL Server 2000, but it was the only way to do it in one statement (a goal for writing powerful SQL, but not always readable or maintainable code).
Now let’s reformulate this query using the new CTE syntax. As demonstrated in the following query, you define two CTEs to replace the derived table and the main table to produce the exact same results, with semantically the exact same query as in the previous example, only clearer to understand.-- SQL Server CTE syntax
While this is still no beginner query, it is much easier to read and understand than the previous version. Not to mention that you can now test the CTE code by simply executing the following:WITH YTDSalesPerson
You can use this instead of having to deal with pulling out large sections of code from your original query. And say you make a correction or change to the calculation. It changes once, and all usages change. All this without having to persist any data or objects. SQL Server does the real work! Consider also that the “black boxed” CTE works independently of the main query, so after testing it, you won’t need to worry if this part of the query works as you debug the larger query.
Using CTEs for Recursive Queries
The second use of CTEs allows for recursive hierarchies to be navigated without the need for complex recursion logic done manually. Hierarchies are common in real-world situations, such as employees and managers (managers are employees), or in manufacturing with parts within other parts (an automobile engine is a part, but it also consists of a lot of other parts inside).
Using T-SQL, it was previously complicated to build queries that dealt with hierarchies. If the number of levels in a hierarchy was known beforehand, it was possible to use self joins, but even that approach was very cumbersome. If you had unlimited numbers of levels, it was practically impossible. CTEs can make hierarchies manageable.
BASIC TREE HANDLING IN T-SQL
-- SQL Server 2000 example
Take a look at an example that uses the Employee table in the AdventureWorks database. This is a classic example of a single-parent hierarchy with the managerId column identifying all employees who report to a specified manager. Let’s look at a typical breadthwise hierarchical query used in SQL Server 2000 to retrieve all employees who reported to the manager whose managerId was 140.
Using CTEs, however, gives a much cleaner implementation:-- SQL Server 2008 syntax
So let’s take this query apart and look at how it works. First, we define the name of the CTE and define the names of the columns that will be output:WITH EmployeeHierarchy (employeeid, managerid)
The first query gets the top level in the hierarchy. It is not required that it return only a single row, but in our case, we are getting all rows where the managerId= 140; that is, all people who work for managerId = 140. We are using a variable here to keep it equivalent to the SQL Server 2000 example.SELECT employeeid, managerid
Then use UNION ALL to connect these rows to the next specified set:UNION ALL
UNION ALL does not specify duplicates to be removed, so you will need to be careful to avoid cyclical relationships in your data. The problem comes when you might have a child row that may also be a predecessor to itself.
For example, consider the following table of data:
The manager of employee 1 is employee 3. Then when you get the manager for employee 3, it is employee 5, and then employee 5’s manager is 1. So when you get to the next row, you would end up in an infinite loop, because you would find that the manager of 1 is 3, and you would just keep going until you hit the nesting limit of SQL Server code (32).
So now we get to the cool part. We get the children of the rows we got in the first part of the query by the name of the CTE—in our case, EmployeeHierarchy. This is how the recursion works. It joins to itself, but again contains itself (hence, the recursive nature of a query calling itself ). It continues on until no employeeid is a manager of anyone else (down to where the real work gets done!).SELECT Employee.employeeid, Employee.managerid
Finally, we use the CTE. The loop happens without us even noticing from the outside by materializing the set of rows, then joining that to the full Employee table. The result is all persons who report to a given manager.SELECT Employee.employeeid,Contact.lastname,Contact.firstname
Before we leave CTEs altogether, there is one more really neat thing that you can do with them. In the query that gets the top level of the hierarchy, you can introduce values into the CTE that you can manipulate throughout the iterations of the query. The following query extends the query to include a level indicator and a hierarchy of the employee’s managers, up to the top manager for the employee. In the recursive part of the CTE, you put code to add one for each level. In our case, we increment the tree level by one, and then we add the employeeid from the level to the hierarchy. We use varchar(max) since we don’t know just how big the hierarchy will be.DECLARE @managerId int
Running this returns the following:
So now we have our hierarchy, it is sorted by manager, and we know how far away an employee is from the manager we passed in as a parameter.
CTEs will be helpful in cleaning up code for sure, but more important, recursive queries will be far easier to deal with than ever before.
The TOP operator is used to specify the number of rows returned by a query. In previous versions of SQL Server, TOP required a literal. So if you needed to parameterize the number of values to return when a statement was coded, the alternative was to use SET ROWCOUNT @<variableName> to restrict the rows returned. This, however, affected all following T-SQL statements. It was also an issue that you needed to reset ROWCOUNT to 0 after every statement you used it on, or big problems would occur when you started affecting only a couple of rows in subsequent statements.
TOP affects only a single statement, and now it allows a variable to set the number of rows affected. For example, you can return a variable set of rows based on a variable from the AdventureWorks Person.Contact table.DECLARE @rowsToReturn int
If the value of the TOP parameter is invalid, such as a null value or a negative value, SQL Server will return an error. In SQL Server, you may update views that are defined using the TOP operator. However, after you update the values that show up in the view, the rows may vanish from the view.
Beyond parameterization, another change to the TOP operator is that it will work with INSERT, UPDATE, and DELETE (again with parameters if desired).
For example, say you have a table and you want to insert the top five rows from a result set (in this case, one that just has seven values from a UNION statement). You can do this with INSERT TOP (N):CREATE TABLE testTop
The derived table contains seven rows, but the TOP operator on the INSERT statement means that this will return only five of them:value
Now, you can use TOP on an UPDATE to change two rows only:UPDATE TOP (2) testTop
Which returns the following:value
Finally, using DELETE, you can remove three rows from the table:DELETE TOP(3) testTop
which results in this output:value
Each of these techniques is basically only good when you are trying to batch-modify a set. So you can do something like this:INSERT TOP (10) otherTable (batchTableId, value)
It is not a technique that will be extremely useful, but it is a much more elegant solution than using SET ROWCOUNT, because it is clear to the query processor from which statement you intend to limit the number of rows. Any other statements in any subordinate objects like triggers do not need to be subject to the limitation, as was the case with ROWCOUNT.
This example has a pretty big “gotcha” that may not be completely obvious to you. Notice there was no mention of which rows would be modified or deleted. As cannot be stated too many times, tables are unordered sets, and modification statements have no ORDER BY clause to order the set they deal with. It will literally just INSERT, UPDATE, or DELETE whatever number of rows you state, purely at the will of the optimizer. (SQL Server Books Online goes so far as to state that the rows affected when TOP is used will be “random,” which, while technically true, is perhaps pushing it.) It stands to reason that whichever rows are easiest to modify, these rows will be the ones chosen. This is unlike how the SELECT statement returns the rows based on an ORDER BY clause if one is present, and it can be a bit confusing. Consider this example:CREATE TABLE top10sales
Ten rows from the Sales.SalesOrderHeader table will be returned and inserted, but they will not be the top ten highest values. For this, you can use the following:INSERT top10sales
Now the values in the top10sales table are the top ten values in Sales.SalesOrderHeader.totalDue. The value of adding TOP to the INSERT, UPDATE, and DELETE statements is to facilitate batching operations. For example, you might want to delete a large number of rows from a table, a few at a time, to keep the transaction small, thereby assisting in maintaining concurrent use of the table, with a small number of locks. So you could execute the following:BEGIN TRANSACTION
Note that using TOP is the preferred way of acting on only a fixed number of rows, rather than using SET ROWCOUNT <N>. SET ROWCOUNT is deprecated and will be removed in the next version of SQL Server.
So, the TOP operator accepts variables as well as literal values and can be used with INSERT, UPDATE, and DELETE statements. This is a good addition, but bear in mind that TOP is merely a filter that tells SQL Server that once it starts processing rows, it needs to return only a portion of the rows that the query would return. If the number is hard-coded, it can help in optimization of the query. Using a variable means that the optimizer will need to assume you want all rows. This could have performance implications.
Extensions to the FROM Clause
SQL Server allows the FROM clause to extend the way you can manipulate data with the SELECT, INSERT, UPDATE, and DELETE SQL statements. The FROM clause supports the following:
One particular annoyance in early versions of SQL Server was that derived tables could not be correlated to other sets in the FROM clause. For example, the following type of query would not have been allowed:SELECT Product.productNumber, SalesOrderAverage.averageTotal
Of course, for the most part, a normal query like this was not a great problem because this query could easily be rewritten as an inner join. However, in SQL Server 2000, Microsoft added table-valued user-defined functions, for which rewriting as a join was not possible. The following example seems natural:SELECT Product.productNumber, SalesOrderAverage.averageTotal
but this is also illegal. Instead, the function would need to be coded to return all rows for all products, which would probably not perform well at all, especially if you had a great many products.
This was one of the most frustrating problems with using table-based functions in SQL Server 2000. The APPLY operator allows queries of this style to make optimal use of derived tables and scalar functions. The operator injects column data from the left-hand table source into the right-hand source (valued function or derived table). There are two forms of APPLY:
In the next block of code, we use the CROSS APPLY operator to execute the derived table once per row in the Production.Product table. The subquery gets only one row, which is the average amount for a sale of that product. If it hasn’t been sold yet, an empty result set would be the result of the CROSS APPLY operation, and the Production.Product row would not be returned:SELECT Product.productNumber, SalesOrderAverage.averageTotal
This returns the following results (truncated for readability):
The real power of CROSS APPLY is with user-defined functions. Instead of a derived table, let’s create a function that returns the value of the largest sale of a given product from the Sales.SalesOrderDetail table.CREATE FUNCTION production.getAverageTotalPerProduct
Now the statement can be rewritten as follows:SELECT Product.ProductNumber, AverageSale.UnitPrice
This returns the same results as previously. Obviously, your multistatement table-valued functions would usually be a lot more complex than this (otherwise, why bother to make it a function?), but for the purposes of this example, this allows us to make use of the function in a very nice format that is usable, and should have good performance. (Clearly what performs well enough varies on a case-by-case basis. Don’t take anyone’s word for it—test, test, and test some more.)
In the previous example, any product rows where there were no matching unit sales were not included in the output. We change this to an OUTER APPLY to get back all rows in the product table, and NULLs for the unit price column where there have been no sales:SELECT Product.ProductNumber, AverageSale.UnitPrice
Products that don’t have any associated sales will return no rows, but you still get a row for each product:
Random Data Sampling
Sometimes it is desirable to get an idea about the distribution of data in a table, but it is impractical because there is just too much data. Obviously, it is seldom necessary to look at every piece of data to get an idea of how the data looks in the table. Take elections, for an example. An exit poll needs to sample only a small proportion of the population in order to predict the outcome with a high degree of accuracy.
TABLESAMPLE lets you get a random set of rows. You can specify either a percentage of the table or a given number of rows to return. However, what you will likely find weird about this operator is that it seldom returns exactly the same number of rows.
In the next example, the query returns a different selection of rows each time it is executed. What seems odd is just how large the variance in the row count will be. For this query, it typically will be between 1,800 and 3,000, which is approximately 2 percent of the number of rows in the Sales.SalesOrderDetail table, as there are 118,990 in the version of the table we are using and 2 percent is 2,379.SELECT *
There is also an option to get a more exact count of rows:SELECT *
Again, however, it will not actually return only 500 rows; in testing, the count has been between 200 and 800. If you want to get back the same results each time you run one of these queries, you can specify the REPEATABLE option and a fixed seed for the randomizer:SELECT *
Given the same seed, you will get the same rows back. One thing to note here: this is not like the repeatable read isolation level. If another user makes changes to the data in the table, you will not get back the exact same rows. It is only true for a given “version” of the table.
As you’ve seen, these commands will not return the exact same number of rows each time, even when you specify a certain number of rows! This behavior has to do with the random nature of the TABLESAMPLE operator, and that the samples are done in a single pass through the table. It should not hurt your queries, but if you need an exact number of rows for a test, you could INTERSECT two table samples (INTERSECT is new and covered later in this chapter in the “EXCEPT and INTERSECT” section) and use TOP to get your set.
Excel has had pivot tables for quite some time, allowing users to rotate a table in such a way as to turn rows into columns, and back again. In SQL Server, two relational operators give some of the same functionality to T-SQL. These operators are PIVOT and UNPIVOT.
One thing that was almost impossible to do in T-SQL was to take a set of rows and pivot them to columns. The PIVOT operator allows you to rotate the columns in a table to rows.
As a very simple example (as simple as it can be anyhow), consider the following table of data, which we will call SalesByMonth. (We put it in the Sales schema; if you are not building this in the AdventureWorks database, you may need to create the schema using the command CREATE SCHEMA
Most likely, this would not be representative of a table in your database, but instead the final output of a query that summarized the data by month. While this is the natural format for a set-based query, it is notlikely that the user will want to look at the data this way; for starters, it is ordered strangely (alphabetic order because of the clustered index on the key!), but even if it were ordered by month, it would still be pretty ugly. It is likely the desired output of this data might be like this:
Using SQL Server 2000 coding techniques, the query would look something along these lines:SELECT year,
Not terribly hard to follow, but pretty messy if you start to need more information. Using the new PIVOT operator, the code is changed to this:SELECT Year, [Jan],[Feb],[Mar],[Apr],[May],[Jun],
The most important part of this query is this:PIVOT ( SUM(amount) FOR month IN
It produces a value for each of the columns in the IN clause that matches values in the month column. This is done via the SUM (amount) FOR month section. The moderately confusing part is that it groups on the columns that are not a part of the PIVOT statement. Since year was not in an aggregate, it grouped the pivot on year. Suppose we remove year from the query as follows:SELECT [Jan],[Feb],[Mar],[Apr],[May],[Jun],
The result is that it groups on all rows:
When you need to store variable attributes in your schema and you cannot determine all of the data requirements at design time, PIVOT is excellent. For example, a store that has many products may have very different attributes for each product. Instead of having different tables with different attributes for each type of product (a management nightmare), you implement a table that allows for varying properties to be created and a value associated with them. Building these “property” or “attribute” tables is an easy technique for associating values with a product, but writing queries to deliver to the client can be very cumbersome.
As an example, let’s extend the Person.Contact table by creating a Person.ContactProperty table that contains properties that you might not know when the schema is designed. We will store the person’s dog’s name, hair color, and car style:CREATE TABLE Person.ContactProperty
then we get the typical result set of a row per attribute:
It will usually be more desirable to display this data as a single row in the user interface or report. To do this currently, you would need to navigate the results and pivot the rows themselves. Not that this is always the wrong way to go. Data formatting is usually best done using the presentation layer, but in some cases, doing this in the data layer will be best; depending on what kinds of clients are being used, or if you need to use the data in a different SQL query, data in this row-perattribute format may be troublesome to deal with.
Instead, we can use the PIVOT operator to take the propertyname values and rotate the data to provide a single row per contact. We use a derived table, selecting all of the columns required in the PIVOT from our ContactProperty table. We then pivot this, getting the attribute values by propertyname for each contactid.SELECT CAST(Contact.firstname + ' ' + Contact.lastname AS varchar(30)) AS name,
We are getting the maximum value of the propertyvalue column for the rotated value, since only one value can be non-null. We are not restricted to only one unique value for the rotation, and this allows all sorts of aggregation to take place, making the PIVOT operator incredibly powerful. Then we specify the column that contains the values that we will pivot around. In our example, we are using the propertyname column, specifying the values that will become the columns in our output. The query output is as follows (noting that there is a null value where there was no value for a given property):
Note that we just used the first three contactid values from the Person.Contact table for this example.
The (almost) opposite effect of PIVOT is, not surprisingly, UNPIVOT. Less useful on a day-to-day-basis, this is a fantastic tool to have when doing data conversions or dealing with poorly designed databases. Going back to the set of data we created with our SalesByMonth query, using the INTO clause, we create a table of data called dbo.SalesByYear:SELECT Year,[Jan],[Feb],[Mar],[Apr],[May],[Jun],
which contains the following output:
Now, to get this back to the original format, we can use a query like the following:SELECT Year, CAST(Month AS char(3)) AS Month, Amount
which returns this output:
This is another really nice addition to what T-SQL can do natively and will come in handy in quite a few cases. One thing you should note, however: UNPIVOT is not exactly the opposite of PIVOT. Null values in the table will not be returned as rows in the UNPIVOTed output. So, if we had the following table:
we would not get rows in our output like the following:
The first two rows would not exist in the output.
In SQL Server, an OUTPUT clause can be used as part of DML statements to assist in auditing changes made during the statement. The OUTPUT clause specifies a statement to output a result set of changes to a table variable.
Just as with triggers, you use the inserted and deleted tables to access the rows that have been modified in a statement and the data that is being deleted.
For example, change firstname in the Person.Contact table to the reverse of the original value. Note the BEGIN TRANSACTION and ROLLBACK TRANSACTION statements to avoid corrupting the data in the Person.Contact table.BEGIN TRANSACTION
This returns the following (assuming you haven’t already made changes
SQL Server 2008 Related Interview Questions
|SQL Server 2000 Interview Questions||MSBI Interview Questions|
|SQL Server 2008 Interview Questions||SQL Server 2005 Interview Questions|
|SSIS(SQL Server Integration Services) Interview Questions||SSRS(SQL Server Reporting Services) Interview Questions|
|Microsoft Entity Framework Interview Questions||LINQ Interview Questions|
|SQL Server Analysis Services (SSAS) Interview Questions||Sql Server Dba Interview Questions|
SQL Server 2008 Related Practice Tests
|SQL Server 2000 Practice Tests||MSBI Practice Tests|
|SQL Server 2008 Practice Tests||SQL Server 2005 Practice Tests|
|SSIS(SQL Server Integration Services) Practice Tests||SSRS(SQL Server Reporting Services) Practice Tests|
|Microsoft Entity Framework Practice Tests||LINQ Practice Tests|
Sql Server 2008 Tutorial
Sql Server 2008 Overview
Sql Server Installation And Configuration
Sql Server Encryption
Automation And Monitoring
Integrated Full-text Search
New Datatypes In Sql Server 2008
T-sql Enhancements For Developers
T-sql Enhancements For Dbas
Sql Server And Xml
Sql Server Xml And Xquery Support
Linq To Sql
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.