Let ’ s Do CRUD with Data - T-SQL

As previously mentioned, you can really only do four things with data: Create it, Read it, Update it, and Delete it. These four operations form the basis of what is commonly called CRUD operations. So far, you've been doing a lot of reading using the SELECT statement. Now it's time to work with data in a way that will affect the data in your database.

Adding Records

Adding rows is done using the INSERT statement. Inserts can be performed one row at a time or on multiple rows, depending on the technique, and target only one table. Before attempting to insert data into a table, it is important to know the following:

  • Which columns require values
  • Which columns have data-integrity constraints
  • Which columns are managed by the database through functions
  • Which columns have default values or allow null values
  • What the data types of the destination columns are

To get started, take a look at the Production.Product table in design view (see Figure Show In Below).

This table has changed a little in the newer AdventureWorks2008 database, but we're just using it as an example. The table shown has five columns, but only one column, the Name, requires a value because every other column either has a default value or accepts NULL. The LocationID column is the primary key. Because it is designated as an identity column, an incremental value will automatically be generated for this column by the database.

Adding Records

INSERT Statement

If you only need to provide a value for this column, the statement would be quite simple:

INSERT INTO Production.Location ( Name ) SELECT 'Secret Hiding Place'

The INTO keyword is optional and is often omitted. We like to include it because we think it reads more like natural language. Two different styles are used with the INSERT statement, and each has its own subtleties. Generally, you could select one technique and pretty much use it for all of your insert operations. We'll show you some examples of each and let you decide when to use them.

INSERT... Values

The pattern of the INSERT statement is to provide a list of column names and then a list of values in parentheses. If values are provided for all columns or all columns with the exception of the identity column (only one identity column is allowed per table), the values are inserted in order, so no column name list is required. This technique is used to insert only one row into the table.

The Contact table is a good example going forward, but, like most of the tables in the new version of the AdventureWorks sample database, it includes some complicated features that will only get in the way right now. For the following examples, we will create a new table, similar to the Contact table, named MyContacts, by executing the following script. Note that the extra spaces and carriage returns are unnecessary and are added only to make this query more readable.

The specific commands and options used to create this table will be covered in Chapter (Creating and Managing Database Objects). For now, just execute this script to help demonstrate how new records are inserted into this table.

For our first example, we are going to include values for only four of the columns. Note that we are skipping the ContactID column. If we were to include a value for the ContactID column, our INSERT statement would fail because this column can only be managed by the IDENTITY() function of the database. In the INSERT statement itself, we include the column names and in the Values list, we provide a corresponding value:

INSERT INTO MyContacts (Title, FirstName, LastName, Phone) VALUES ('Ms.', 'Pebbles', 'Flintstone', '123-4567')

Note the use of single quotes to denote literal values to be inserted into the table. If we had a numeric value to insert, we would not use the quotes. The order of the values must match the order of the columns in the table if no column list is provided, with the exception of any identity column that is omitted as the following example shows:

INSERT INTO MyContacts VALUES ( ‘Mr.’, ‘Fred’,‘Caveman’, ‘Flintstone’ , ‘Phd’, ‘fredf@bedrock.com’,‘(111) 123-4567’ , DEFAULT)

There are a couple of things to think about in this query that we are not quite ready to cover. First, not all of the table columns are included in the query. Second, one of the columns has a DEFAULT rule defined for it. Because we are just working on the basics right now, don't worry about this. It will be covered in the upcoming section "Inserting NULL, Defaults, and Other Column Considerations.”

INSERT... SELECT

This form of the INSERT statement is similar to the previous values form except it uses a SELECT statement to provide values. As discussed in earlier chapters, the SELECT statement can return scalar literal values, as well as sets of literal values, without even hitting a table in the database. Because SELECT can be used to include a variety of different types of values (from tables, joins, unions, groups, aggregates, and literals), this is a much more flexible technique. Practically anything can be inserted as long as the column count and data types match. Because a SELECT statement can return multiple rows, this would result in multiple rows being inserted into the destination table. The following is the equivalent of the previous example using this technique. In this example, columns and values for the MiddleName and NameStyle columns have been omitted. This insert is successful because NULL will automatically be inserted as needed (because these columns allow the NULL value):

INSERT INTO MyContacts (Title, FirstName, LastName, Phone) SELECT 'Mr.', 'Bam Bam', 'Rubble', '234-5678'

Because we've executed the previous two INSERT statements, we'll just return all the records from the MyContacts table (SELECT * FROM MyContacts) and scroll down to view the last two rows, shown in Figure (Below).

LET ’ S DO CRUD WITH DATA

You can see that the ContactID column contains an auto-generated, sequential identity value. The MiddleName column is set to NULL and the IsActive is set to its default value of 0.

Inserting Multiple Records

A convenient new addition to SQL Server 2008 T-SQL is the ability to insert multiple rows in one INSERT statement. This is accomplished with a modified version of the INSERT . . . VALUES pattern. Just add each record to be inserted in a separate set of parentheses, each separated by a comma, as in the following example:

INSERT INTO MyContacts (Title, FirstName, LastName, Phone) VALUES ( ‘Mr.’, ‘Great’,‘Gazoo’, ‘234-9999’ ) ,( ‘Mr.’, ‘George’, ‘Slate’, ‘123-4567’ )

There is no limit to the number of new records that may be inserted in a single statement. This is a nice and simple addition to the SQL Server product.

Inserting NULL, Defaults, and Other Column Considerations

A NULL is a special value that essentially means that we didn't want to place any other value in a column. It's the SQL standard way of saying "nothing.” Most common data types have a default value. For example, a numeric type by default is zero. The problem with this is that zero means something in the numerical world. If you are storing manufacturing unit records and a record stores the number of units produced, any number in that column would have meaning. But if your intention is to state that you don't know what that value is, you don't want to store zero because that means that you actually produced zero units. In the MyContacts example, we don't have Bam Bam's middle name. Frankly, we don't even care what it is. Whether we don't know or we don't care, we use the value NULL to store a placeholder for this missing value.

Different settings at the server, database, and user session level for SQL Server can affect the way NULL is available for a particular column. Although not necessarily complicated, this can be a particularly confusing issue because there are a number of variable elements to be considered. Before you read on, please keep in mind that this only becomes an issue if the default settings are altered and, in any case, the situation can be remedied by explicitly defining columns in tables as NULL or NOT NULL.

You can modify database properties through SQL Server Management Studio or by using T-SQL script. To view or change this option, in the Management Studio Object Browser, right-click the name of the database and then choose Properties from the menu. Figure (Show In Below) shows the Properties dialog (Options tab) for SQL Server 2008 and as you can see, the ANSI NULL Default property is set to False.

Inserting NULL, Defaults, and Other Column Considerations

The same settings can be applied using script similar to the following:

sp_dboption 'AdventureWorks2 008'/ 'ANSI Null Default','False'

The following script will override the database default setting when executed prior to creating or altering a table:

SET ANSI_NULL_DFLT_ON ON

When a table is created, the database settings apply unless they are explicitly overridden in the session. When ANSI_NULL_DEFAULT is set to ON (True), all user-defined columns and data types that have not explicitly been defined as NOT NULL during the creation or altering process of the table default to allowing NULL values. This is the opposite of the default setting for SQL Server when installed. It is good practice to explicitly define NULL and NOT NULL regardless of the current or eventual resetting of ANSI_NULL_DEFAULT. If this setting has been altered, the INSERT statements assume that the opposite will fail. Note that Constraints (Check and Default) will apply before NULL.

The Contact table in AdventureWorks for SQL Server 2005 is roughly equivalent to the Person table in AdventureWorks2008. To keep things simple, we are only going to use a subset of the columns from these tables in the following examples. We will provide an example for each of the two product versions.The AdventureWorks2008 Person table contains these columns that are explicitly NOT NULL:

  • BusinessEntityID

  • PersonType

  • NameStyle

  • FirstName

  • LastName

  • EmailPromotion

  • rowguid

  • ModifiedDate

The AdventureWorks Contact table contains these columns that are explicitly NOT NULL:

  • ContactID

  • NameStyle

  • FirstName

  • LastName

  • EmailPromotion

  • rowguid

  • PasswordHash

  • PasswordSalt

  • ModifiedDate

Of these, the primary key (BusinessEntityID or ContactID) value is supplied by the IDENTITY() function. The NameStyle, ModifiedDate, and rowguid are supplied by Default Constraints if no values are explicitly provided (the rowguid's Default Constraint employs the NEWID() function, which can be used to generate a globally unique identifier [GUID]). All other columns, as previously stated, are explicitly defined to either prohibit or allow NULL values.

Inserting Rows from Another Table

Using the INSERT . . . SELECT pattern, we can bring data from another table or complex query. Conveniently, because we modeled the MyContacts table after the Contact table, we can use it to populate our table. Columns in our select list don't have to have the same names but do have to be compatible data types. First, we'll just use a SELECT statement to view the Person or Contact row we intend to insert:

AdventureWorks2008:

SELECT FirstName, LastName, Title FROM Person.Person WHERE BusinessEntityID = '273'

AdventureWorks:

SELECT FirstName, LastName, Title FROM Person.Contact WHERE ContactID = '1010'

This returns one record, as shown in Figure Below.

returns one record,

It is the best practice to always test the SELECT statements that will be used in transaction statements to prevent failures within the Insert, Update, or Delete statement. The second step is to add the INSERT statement to the front of this query and make sure the columns match up between the two lists:

AdventureWorks2008:
INSERT INTO MyContacts (FirstName, LastName, Title) SELECT FirstName, LastName, Title FROM Person.Person WHERE BusinessEntityID = ‘273’

AdventureWorks:

INSERT INTO MyContacts (FirstName, LastName, Title) SELECT FirstName, LastName, Title FROM Person.Contact WHERE ContactID = ‘1010’

Finally, execute a SELECT statement against the MyContacts table and scroll to the bottom of the list to check the result, as shown in Figure Below.

bottom of the list to check the result

Great care should be taken to limit the size (rows returned) of the SELECT statement with a WHERE clause when querying another table. Each row requested in the SELECT statement will remain locked from changes during the Insert process. This will momentarily block other requests for operations performed on the same table. In an active, multi-user database, nearly every operation carries a statistical possibility of coinciding with a conflicting operation. Most of the time, these operations are simply queued to execute in turn. The database engine will retry a blocked transaction before it times out in error. However, if two competing operations happen to lock the same tables and are waiting for the availability of locked objects before they can continue, this could result in a deadlock situation in the database. The specific behavior will depend on transaction-level locking options and database settings.

SELECT INTO

At times, you may want to populate a new or a temporary table with the results of a query, a technique that can be a real time-saver. Instead of creating a new table the old-fashioned way (by defining all of the columns and data types), and then using an INSERT statement to populate the new table from existing data, this statement simply creates the table on-the-fly using the output and data types from a SELECT statement. Although we try to avoid storing duplicate data when it's not necessary, sometimes it's just easier to copy results into a table for temporary storage — sort of a snapshot of information that alleviates locking of the transactional table by read requests for reports and so on. This might be especially useful to support a complicated report. The results of a complex query may be written to a table either in the current database, a reporting database or warehouse, or the TempDB database, where the data can be reused in other queries. The INSERT INTO statement requires that you target an existing table. If you want to create a new table from the results of the query (from one or multiple tables), use the SELECT INTO statement. In this example, we select a subset of contact records and create a new table called MyContacts:

Adventure Works2008:

SELECT FirstName, LastName, Title INTO MyOtherContacts FROM Person.Person WHERE Suffix = 'Jr.'

AdventureWorks:

SELECT FirstName, LastName, Title INTO MyOtherContacts FROM Person.Contact WHERE Suffix = 'Jr.'

Keep in mind that the new table is a permanent member of the database like any other table, unless you have placed it in the TempDB database as a temporary table. Typically, in a production environment, a database administrator wouldn't grant the privilege to create new tables to all users. If you only need to use the data for a short time, you can create a temporary table by starting the table name with a pound sign (#) or double pound sign (##). A single pound sign denotes a local temporary object that is only visible from the current connection. A double pound sign denotes a global temporary object that is visible to all connections as long as the connection that created it is still active. Here is the same statement that creates a temporary table:

Adventure Works2008:

SELECT FirstName, LastName, Title INTO #MyOtherContacts FROM Person.Person WHERE Suffix = 'Jr.'

AdventureWorks:

SELECT FirstName, LastName, Title INTO #MyOtherContacts FROM Person.Contact WHERE Suffix = 'Jr.'

The difference between this and the previous example is that the table will be created in the TempDB database. When the connection used to create this table is closed, SQL Server automatically removes the table and reclaims the storage space. The local temporary table is often used in a stored procedure where cursors are inefficient. The stored procedure first creates the temporary table using a SELECT INTO statement and then returns rows from this table as a result set. When the stored procedure completes and the connection is closed, the local temporary table in TempDB is dropped.

Managing Inserts Using Stored Procedures

In a large-scale production environment, administrators often limit users' ability to insert, update, and delete records directly. Implementing such restrictions while accommodating all user and application needs requires careful planning and can be a lot of work. One popular approach is to deny users all transactional activity against tables and create stored procedures to manage the inserts, updates, and deletes. For each major entity in the database, there will typically be at least three stored procedures: to insert, update, and delete records in that table. Using stored procedures offers the added benefit of enforcing any special business rules, simplifying complex tables and returning custom error messages.

Chapter (Creating and Managing Database Objects) takes a comprehensive look at stored procedures. For now, we'll keep this simple by creating a stored procedure to insert rows into the MyContacts table. The input parameters are listed first. As you can see in the following code example, each of these parameters corresponds to a column in the table.

This procedure accepts three input parameters: @FirstName, @LastName, and @Phone. These parameters are used to pass values to the INSERT statement. The global constant @@Identity is used to obtain the last identity value — in this case, the new ContactID value generated by this INSERT statement. To test this procedure, execute the following SQL statement:

spIns_MyContacts 'Betty', 'Rubble', '(111) 234-9876'

And just to keep things real, here is an example of a more production-like stored procedure that will insert records into the AdventureWorks Product table. Note that the required (non null) parameters are listed first. This is to simplify the execution call so optional parameters may be omitted.

Modifying Records

When any data in a record or group of records is modified, the user or application making the changes must have exclusive access to the record or records for a short period of time. This locking mechanism is an important part of SQL Server's transaction management model. So, what exactly gets locked? This depends on different factors. SQL Server supports row-level locking and, when feasible, will lock a single row so that neighboring rows don't get locked, affecting other users. This was an issue in earlier versions of SQL Server that supported only page-level locking. In truth, SQL Server will sometimes lock all of the records in an 8KB page, SQL Server's native storage allocation unit. It may do this because it deems this method to be faster or more efficient than locking individual records. Based on inter-table relationships, when a row is modified in one table, locks may be placed on dependent rows in the related table. It may also choose to lock an entire table, groups of tables, or even the entire database, under certain circumstances.

The database engine must make decisions about how it will process each transaction. Based on the scope or number of records involved in a transaction, different locking options may be invoked to manage the transaction as efficiently as possible. With rare exception, this should be completely transparent to all users and operations. Any concurrent or conflicting requests are simply queued and executed in turn. Only after a transaction request has waited several seconds will it time-out and return an error. Fortunately, SQL Server manages record-locking and data modification automatically. Understanding the fundamentals of SQL Server's locking behavior will help you work with the database engine as you modify data. Multi-table updates and other conditions where you should be mindful of these issues are discussed at the end of this chapter.

UPDATE Command

Data is modified one table at a time. A single UPDATE statement can affect one, many, or all records in a table depending on filtering criteria. The syntax is uncomplicated. Column values are modified using the SET keyword.

Filtering Updates

There is no undo feature in T-SQL, short of restoring a backup. Because an update can affect many records, it is important to be absolutely sure of the records you intend to modify. To be cautious, a trial is advisable using only a SELECT query. Inspect the rows returned and then, after you verify that these are the rows you want to change and that the target values are correct, add the UPDATE command to your query. Here's an example. Our objective is to raise the standard cost for all mountain bikes by 10 percent. We are going to break this down into steps to verify the records and our calculation before actually performing the update.

You may recall that we used a similar query in Chapter (Introducing the T-SQL Language) to raise the StandardCost. If you have worked through either of these exercises, the StandardCost values in your Product table may be different than ours.
The first step is to perform a simple SELECT query. This query returns the product name and current cost:

SELECT Name, StandardCost FROM Production.Product WHERE ProductSubCategoryID = 1

In the results, shown in Figure (Below), we verify that these are all mountain bikes and that the StandardCost column contains the values we want to modify.

LET ’ S DO CRUD WITH DATA

The next step is to calculate the new value for the StandardCost column. We'll create an additional column with the calculated value:

SELECT Name, StandardCost, StandardCost * 1.1 FROM Production.Product WHERE ProductSubCategoryID = 1

Now, we check the calculated value and take a quick look over the entire list to make sure everything is in order (see Figure Show In Below).

make sure everything is in order

The calculated value is correct — a 10 percent increase — and the row selection is correct, so we'll move on to the next step and prepare to modify these rows. The FROM clause gets moved to the top and becomes the UPDATE statement. We are dropping the Name column because it is not affected by this query, and then we are using the remaining two references to the StandardCost column to form the SET statement:

UPDATE Production.Product SET StandardCost = StandardCost * 1.1 WHERE ProductSubCategoryID = 1

When this statement is executed in the Query Editor, no results are returned. The following message is displayed on the Messages tab after the name of your server and user name:
(32 row(s) affected)
To verify the results, we'll execute the first query again:

SELECT Name, StandardCost FROM Production.Product WHERE ProductSubCategoryID = 1

This time, the StandardCost values have increased by 10 percent, as shown in Figure Below.

StandardCost values have increased by 10 percent

When working with important data, we try to make it a point to test our queries in this way before actually performing the update. Often, we won't if it's a simple query and we are certain that we have it right the first time. However, we have learned that it's better to err on the side of caution. In any case, make sure you have a current backup copy of the data just in case things don't go as planned.Now, let's look at some variations of the UPDATE statement. In the previous example, we updated one column value. Modifying multiple values is quite easy. You'll perform this exercise with a little less hand- holding. Begin by executing the following query to view all mountain bike products:

SELECT ProductID, Name, StandardCost, Color FROM Production.Product WHERE ProductSubCategoryID = 1

Make note of the first record. Unless you've run the next query before, the name should be Mountain-100 Silver, 38. Also make note of the StandardCost and Color column values. We want to change the color, cost, and the name. The name value will reflect the new color and a more descriptive frame size.Figure (The Figure Show In Below) shows these records.

 name value will reflect the new color and a more descriptive frame size

Now update this record. Each column value assignment is included in a comma-delimited list in the SET statement. This statement shows how to update a product record with multiple column values:

This, of course, modifies only the silver 100 model, 38mm mountain bike. If you run the first of these two queries, you should see that it is now a gold bike along with the other changes made to the cost and name. The results are shown in Figure (Below).

 name value will reflect the new color and a more descriptive frame size

Updating Rows Based on Multiple Tables

Sometimes you will need to modify records in one table based on conditions in another table. This can be accomplished using a join or subquery. The rule is that you can only update column values in one table. In the following example, we join the SalesOrderHeader and SalesOrderDetail tables. Based on criteria in the header table, we modify the order quantity value in the related detail row:

In this case, there is only one related detail row but there could be multiple detail rows related to the header row.

When setting values, it's common to derive those values from various system functions. One of the simplest and most common of these functions is GETDATE(), which returns the current date and time. Using this function, we'll stamp an order record to indicate that it has been shipped. First, we'll set the stage for this scenario. In the AdventureWorks2008 database, all the order records have already been marked as having been shipped. Suppose that a customer calls your customer service department asking about their order. You look up the order and see that it is marked as having been shipped to the customer. Upon careful investigation, you discover that the package was returned because the customer's address was incorrect. Your first order of business is to update the record to indicate that it was not shipped (or at least not received by the customer). The order in question is SalesOrderID 5005. This statement will remove the ship date, setting it to NULL:

UPDATE Sales.SalesOrderHeader SET ShipDate = NULL WHERE SalesOrderID = 43659

Now you obtain the correct address and schedule the package for shipment. Once confirmed, you update the order record with the current data and time using the GETDATE() function:

UPDATE Sales.SalesOrderHeader SET ShipDate = GETDATE() WHERE SalesOrderID = 43659

Updating Using Views

Most database professionals agree that the traditional purpose for views is to provide a read-only view of data from tables. One of the most compelling capabilities is that sensitive data can be secured and protected — both selected rows and columns hidden from the user's view. Complicated queries and joins can be represented as if they were a single table. We don't normally think of these "virtual tables” as being updatable, but it is possible to update records through views under certain conditions. Updates can only affect the columns of one table at a time. Further, values created by grouping and aggregation cannot be updated. As a rule, if you intend to create a view to support updates, it should either reference only one table or multiple tables through inner joins. All required columns must be included, and update statements must include only references to a single table within the view. If you plan to perform updates through existing views, be prepared for a bumpy ride as there are very specific requirements about indexes and locking options that can make this practice quite restrictive.

Updating Records Using Stored Procedures

Probably the most comprehensive and secure method for managing record updates is to allow updates only through stored procedures. When this approach is taken, a separate stored procedure is typically used to modify the records for each major entity in the database. The following example is a stored procedure to modify a record in the MyContacts table we created earlier.

Executing this procedure is simple. We just pass the ContactID used to identify the record we want to update along with the values for each column as parameters defined in the procedure script:

spUpd_MyContacts 5, 'Betrice' 'Rubble' (222) 234-7654'

Here's an example of a slightly more sophisticated stored procedure that updates a record in the Product table. Note that the required parameters are listed first, followed by those with defaults that can be ignored when the procedure is called. We've made a point to include all of the columns in this table just to demonstrate what this would normally look like:

Note that most of the parameters in the list of input parameters (the variable names preceded with ©) are assigned default values. Some accept NULL whereas others, such as ©MakeFlag and ©FinishedGoodsFlag, are set to the bit value 1, or True. This is so these parameters are optional when executing the procedure. The rowguid column is set using the NEWID() function to generate a unique value. This may or may not be appropriate logic in an update procedure and will depend on specific business requirements. We have included this just to demonstrate a variety of techniques for setting values.

This stored procedure is actually simplified. In production, procedures typically include some conditional business logic and error-handling code. These scripts can be time consuming and cumbersome to write. However, once written and debugged, using stored procedures can significantly simplify data management going forward. Chapter (Creating and Managing Database Objects) revisits this stored procedure when discussing the finer points of database programming.

Removing Records

Removing records from a table is very easy to do — maybe too easy. Depending on your viewpoint, this convenience could be a blessing or a curse. Take care before you start practicing your deleting skills on production data. Having said that, we are going to ask SQL Server to delete all of the product records in the AdventureWorks database.

DELETE Command

As you can see, this is a simple statement. There is no need to address specific columns because the DELETE statement removes entire rows of data:

DELETE FROM Production.Product

The FROM clause is actually optional. Personally, we find it a little easier to read this statement with the FROM clause but that's a minor point. This statement does the same thing:

DELETE Production.Product

Did it work? No. SQL Server returned an error, fortunately:

The DELETE statement conflicted with the REFERENCE constraint "FK_ProductInventory_Product_ProductID". The conflict occurred in database "AdventureWorks", table "Production.ProductInventory", column 'ProductID'. The statement has been terminated.

Due to the foreign key constraint displayed in the error the statement failed. The database won't allow rows to be deleted if there are related rows in another table. What about the product records that do not have related records in another table? That's not going to happen either, because every Insert, Update, and Delete statement is automatically wrapped into a transaction. You'll recall that transactions are an all-or-nothing proposition; either all the records are affected or none of them are. In this case, no records are affected.

Embrace the WHERE Clause

Just as when updating records, it's important to test the water and make sure you delete the records you intend to. The Product table has foreign key constraints defined with several tables, which make it difficult to delete existing records. For demonstration purposes, we'll add a few products that we can play with:

Your objective is to remove all product records for products that would be unprofitable to sell, where the ListPrice is less than the Standard Cost. Before actually deleting any records, test your criteria using a SELECT statement:

SELECT ProductID, Name, ProductNumber, ListPrice, StandardCost FROM Production.Product WHERE StandardCost>ListPrice

Two rows meet these criteria, as shown in Figure Below.

Two rows meet these criteria,

After verifying that these are the records you intend to delete, write the DELETE statement, appending this WHERE clause to the end to affect the same records:

DELETE FROM Production.Product WHERE StandardCost>ListPrice

These two records are deleted.

Deleting Records Based on Another Table

You may need to remove records from one table based on conditions in another table. Usually there will be a relationship of some kind between these tables. Deletes can be facilitated using any type of join or subquery expression between multiple tables.

In this scenario, suppose that one of your salespeople, Amy Alberts, has left the company and you want to archive all of the related sales orders and detail records. You've already copied these records to their respective archive tables so now all you need to do is delete them. Amy was the salesperson for 39 sales order records with 586 related order detail records.

Before we continue, we would like to work with backup copies of the SalesOrderDetail and SalesOrderHeader tables. This will allow us to delete records without affecting the existing sample data already in the AdventureWorks database. Execute the following script to create a new copy of this table:

SELECT * INTO MySalesOrderDetail FROM Sales.SalesOrderDetail SELECT * INTO MySalesOrderHeader FROM Sales.SalesOrderHeader

The following queries will use the new MySalesOrderDetail and MySalesOrderHeader tables without removing any records from the existing tables. For the purpose of this exercise, assume that we are using the SalesOrderDetail and SalesOrderHeader tables instead of the copies we just created.

Try It Out

Figure (The Figure Show In Below) shows the relationship between the tables involved. By specifying an EmployeelD (on the left side of the diagram), you can delete related SalesOrderDetail rows. The SalesOrderHeader, which defines orders, is related to the SalesPerson table through the SalesPersonID foreign key. This is a many-to-many intersect table between Employee and SalesOrderHeader. Order details, in the SalesOrderDetail table, are related to orders through the SalesOrderID foreign key column. Because it is on the outer side of the relationships and nothing else depends on it, begin by deleting rows from this table first.

 relationship between the tables involved

There are a few techniques and this is one of the easiest. Earlier you saw how the IN() function is used to compare a value to a comma-delimited list of values. The same function can be used to compare a value with a single-column result set. The mechanics are simple: create any SELECT query that returns a single column, and then pass the results of this query to the IN() function used in the WHERE clause of a DELETE statement.

For the SQL Server 2008 AdventureWorks2008 database:

For the SQL Server 2005 AdventureWorks database:

This deletes the 586 order detail records. Removing the sales order records is easy and requires only a simple change. Because the SalesOrderID is also the identifying column for records in the MySalesOrderHeader table, all you need to do is change the table name in the DELETE statement and the join.

For the SQL Server 2008 AdventureWorks2008 database:

For the SQL Server 2005 AdventureWorksdatabase:

This deletes the 39 sales orders. If you were working with the original tables and not the backup copies, it would be necessary to delete the order detail rows first because these records depend on the existence of the sales order records in the SalesOrderHeader table.

TRUNCATE TABLE

The DELETE command is a logged operation. For deletes on a small number of rows or manual operations, there may be a negligible difference in performance. However, for repeated and automated deletes, or on a large volume of records, skipping the transaction logging can improve performance dramatically.

The TRUNCATE TABLE command does only one thing — it removes all of the records in a table without logging a transaction. It performs the deletes in a manner far more efficiently than the DELETE command. You cannot filter specific rows to delete or use any kind of selection criteria. This statement will very efficiently remove all order detail records from the backup copy of the table:

TRUNCATE TABLE MySalesOrderDetail

We are going to do a performance test, comparing the DELETE and TRUNCATE techniques. We don't intend for you to follow along because it would take a bit of work to set this up. Before running the following script, we have created two additional databases, called AW_1 and AW_2. Using an SSIS package, we copied all of the objects and data from Adventure Works into each of these two identical databases. The following script gets the current time before and after performing each of these two operations. After each, we compare the two times and display the number of milliseconds that it took to complete the operation.

The results are as follows:

(121371 row(s) affected) Time to Delete: 14030ms Time to Truncate: 130ms

The Truncate technique takes less than 1 percent of the time it took to process the DELETE statement. The Delete operation took about 14 seconds and the Truncate operation took about Vz of a second. That's a big difference! This is because transactional operations are physically written to the transaction log and then rolled forward into the table, one row at a time. The TRUNCATE statement doesn't do all of this. It simply deallocates all the data pages for a table. In reality, data in these pages is not actually changed, but the pointer entries in the index allocation map for these pages are removed. Eventually, data left in the pages gets overwritten but is not really removed.

The only limitation of the TRUNCATE command is that it is a privileged command and only database owners or system administrators can execute it, even if it is encapsulated in a stored procedure.

Automating Inserts, Updates, and Deletes with the MERGE Command

One of the classic challenges when loading a group of records into a table is how to identify and deal with existing records in the target table. A common scenario is to insert a record if it doesn't exist, and update the existing record using the data in the source table if it does exist. In the past, it was necessary to write complex stored procedures to implement all of this logic and explicitly perform the INSERT or UPDATE commands for each record depending on whether certain columns in the source and destination matched up. This technique is commonly referred to as an "UPSERT" (Get it? Update . . . Insert.We guess it wasn't that hard to figure out, huh?)

Thanks to our good friends at Microsoft, SQL Server 2008 now includes a new command called MERGE that automatically implements this "upsert” logic and then performs the appropriate insert or update action based on the criteria provided. The MERGE command can also be used to completely synchronize two different sets of data by also deleting target records based on missing records or other criteria in the source. The language of the MERGE command is not a lot different than you would write yourself in a query using all of the component commands but because the logic is optimized and implemented at a lower level within the query processing engine, it should offer better performance than a custom-written stored procedure. To demonstrate, we will create a new table called MyOtherContacts and populate it with a few Contact records.

SELECT TOP 5 CONVERT(nvarchar(10), NULL) AS Title, FirstName, LastName INTO MyOtherContacts FROM Person.Person

For the SQL Server 2005 AdventureWorks database:

SELECT TOP 5 CONVERT(nvarchar(10), NULL) AS Title, FirstName, LastName INTO MyOtherContacts FROM Person.Contact

The Title column is set with NULL values so we can update them later. We have made a point to use the nvarchar data type so it is compatible with any incoming Title values. Next, we'll add some records that don't match any existing Contact rows:

INSERT INTO MyOtherContacts (FirstName, LastName) VALUES ( ‘George’, ‘Jetson’ ) , ( ‘Jane’, ‘Jetson’ )

Figure (Show In Below) shows the target table before we make any changes to the data. Note the five records inserted from the Contact table and the two records we added.

target table before we make any changes to the data

Now, let's review the business rules for the merge query:

  • If the MyOtherContacts table contains any records matching those in the Contact table, we want to update these rows with matching Title values.
  • If the Contact table contains records that don't match any in the MyOtherContacts, they should be inserted into MyOtherContacts.
  • If the MyOtherContacts table contains records that don't match any in the Contact, they should be deleted.

The following example performs these operations in one query. We are limiting the records from the Contact table to keep things manageable and because there are some duplicate FirstName and LastName values in the table that could complicate things. The MERGE statement specifies the table to be affected. The USING statement is used to specify a table or query to test for matching records with a special join operator in the ON clause. To make the updated records more obvious, we are converting the Title values to upper case. Note the matching logic in the qualifying statements; MATCHED, NOT MATCHED, and SOURCE NOT MATCHED. These statements are used to determine whether rows in the first table (the source) match or don't match records in the second table (the target) based on the logic in the ON clause.

For the SQL Server 2008 AdventureWorks2008 database:

For the SQL Server 2005 AdventureWorks database:

Finally, take a look at the results in Figure (Show In Figure Below). Note that the five original contact records have titles in uppercase as a result of the UPDATE statement. The two records that didn't exist in the Contact table have been deleted and an additional five contact records have been inserted.

target table before we make any changes to the data

A thorough discussion of the MERGE command would require this book to move from the beginning to advanced category. Suffice to say that there is much more you can do with this new product feature by combining the simple application you've seen here with complex joins and conditional logic.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

T-SQL Topics