# Getting XML into the Database - SQL Server 2008

Now that you understand a bit about XML and XPath, we can start talking about how you can get XML into SQL Server. There are several different ways to do this. First, you can just dump your XML into a nvarchar column in the database using a simple INSERT statement. Using this technique is just like entering any text into a column. With SQL Server, you can use the XML datatype rather than a text column.

There are three other ways of getting XML into your database:

• Shred your XML into multiple columns and rows in a single database call. To do this, you can use the OPENXML rowset provider. OPENXML provides a rowset view over an XML document and allows you to write T-SQL statements that parse XML.
• Use updategrams, which are data structures that can express changes to your data by representing a before-and-after image. SQLXML takes your updategram and generates the necessary SQL commands to apply your changes.
• Use SQLXML’s XML BulkLoad provider. Using this provider, you can take a large set of XML data and quickly load it into your SQL Server.

SQLXML is an additional set of technologies that include updategram support, the SQLXML BulkLoad provider, client-side FOR XML support, and SOAP support. For SQL Server 2000, SQLXML 3.0 shipped separately; it doesn’t need to run on a server. With SQL Server 2005 and SQL Server 2008, SQLXML 4.0 ships with the product, but it can also be redistributed on its own. Don’t confuse SQLXML with the SQL Server XML datatype.

On the other hand, OPENXML can be used in stored procedures and UDFs, since it ships as part of the native T-SQL language. You’ll pay a performance penalty for this integration though. OPENXML requires you to use a stored procedure, sp_xml_preparedocument, to parse the XML for consumption. This stored procedure loads a special version of the MSXML parser called MSXMLSQL to process the XML document and, in turn, loads the entire XML document into memory.

Updategrams are very useful for applications where you want to modify your database and you are OK with building an annotated schema and applying those changes through this annotated schema. SQLXML takes the updategram and translates it to SQL Data Manipulation Language (DML) statements.
However, if you need to apply business logic to the SQL DML statements, you’ll be unable to use updategrams, since you cannot access the generated DML statements.

Before we get to the details of using these techniques, we need to look at how to configure SQL Server for SOAP.

SQL Server Configuration for SOAP

With SQLXML 3.0, SQL Server 2000 required an Internet Information Services (IIS) server to listen for SQL commands using either URL queries or SOAP calls. The Internet Server Application Programming Interface (ISAPI) listener will parse out the SQL, execute it, and return a result.

Starting with SQL Server 2005, SQL Server can natively listen on a port for HTTP calls, without requiring IIS. Rather than using the IIS configuration utility from SQLXML 3.0, you can use serverside endpoint support. However, if you still want to keep an IIS server in the mid-tier, you’ll need to continue to use the SQL ISAPI listener included with SQLXML 3.0 against your SQL Server back end. You can also call SQLXML from your applications, since SQLXML supports a managed object model, as you’ll learn later in this chapter in the section “Programming SQLXML from .NET and COM.”

As a simple setup for this chapter, we’ll create an endpoint so we can send our queries and updategrams to the server. The next chapter covers full web services support.

To configure our server, we’ll just issue a CREATE ENDPOINT command and allow our server to listen for T-SQL batches, as follows:

CREATE ENDPOINT pubs
STATE = STARTED
AS HTTP (
path='/pubs',
AUTHENTICATION=(INTEGRATED),
PORTS = (CLEAR)
)
FOR SOAP(
WSDL = DEFAULT,
BATCHES=ENABLED
)
GO

This code creates a virtual directory called pubs. Remember that this virtual directory will not show up in IIS’s virtual directories. Be careful about this, since you may have an endpoint that tries to use the same port as an existing IIS endpoint. You cannot have both SQL and IIS listen on the same ports.

OPEN XML

Rather than needing to parse XML into rows by loading and parsing the XML and then iterating through the XML and generating T-SQL commands, you can use the OPENXML function. The syntax for OPENXML may look difficult at first, but once you try it, you’ll see that it is very approachable.

OPENXML(idoc int [in],rowpattern nvarchar[in],[flags byte[in]])

The first parameter is the integer handle to your XML document. A handle is just a unique integer identifier for your document. You can retrieve this using the built-in sp_xml_preparedocument storedprocedure. When you pass in your XML document as a parameter, the sp_xml_preparedocument procedure parses it and returns the integer you need to pass to the OPENXML function. The XML document you pass can be text based, or you can pass the new XML datatype starting with SQL Server 2005. You can optionally pass the namespace Uniform Resource Identifier (URI) you want for your XPath
expressions. Your usage of this parameter will depend on your usage of namespaces in your XML. If you don’t use any namespaces, you won’t use this parameter in most of your calls to the stored procedure.

Conversely, the sp_xml_removedocument built-in procedure takes the integer handle to your XML document and removes the internal in-memory representation of your XML document that you created with sp_xml_preparedocument. You should call this stored procedure after you are finished with your XML document. If you forget, SQL Server will destroy the in-memory representation once the session that created it disconnects. However, it isn’t good practice to rely on this behavior.

The second parameter to OPENXML is the XPath expression that you want to use to parse out the rows. This expression can be as simple or complex as you require.

The third parameter is optional and allows you to switch from attribute- to element-centric mapping. By default, OPENXML uses attribute-centric mapping, which is a value of 0. You’ll want to switch this to element-centric mapping if your XML is element-centric by specifying a value of 2. By passing a value of 1, you are telling SQL Server to use attribute-centric mapping by default, and for any unprocessed columns, element-centric mapping is used. A value of 8 specifies to not copy overflow text to the @mp:xmltext metaproperty.

Finally, we have the WITH clause. This clause allows you to specify a schema definition for your newly created rowsets, or you can specify a table if you know your XML will map to the schema in a table that already exists. The schema definition uses this format:

ColName ColType [ColPattern | Metaproperty][, ColName ColType
[ColPattern | Metaproperty]...]

where:

• ColName: The name of the column in the table.
• ColType: The SQL datatype you want for the column. If the XML type and the SQL type differ, coercion occurs, which means that SQL Server will try to find the closest native type that can store your data.
• ColPattern: An XPath expression that tells OPENXML how to map the XML value to your SQL column. For example, you may want to explicitly tell OPENXML to use a particular attribute or element of the parent node for a certain column value. If you don’t specify the column pattern, the default mapping you specified, attribute or element, will be used.
• Metaproperty: The metaproperty attribute that you want to put into the column. Metaproperty attributes in an XML document are attributes that describe the properties of an XML item (element, attribute, or any other DOM node). These attributes don’t physically exist in the XML document text; however, OPENXML provides these metaproperties for all the XML items.

The metaproperties allow you to extract information, such as local positioning and namespace information of XML nodes, which provide more details than are visible in the textual representation. You can map these metaproperties to the rowset columns in an OPENXML statement using the ColPattern parameter. Below it shows the different values for the metaproperty attribute.

Metaproperty Values

Let’s now take a look at a couple of examples that use the OPENXML function. The XML document that we will be using is quite simple:

<ROOT>
<authors>
<au_id>172-32-1176</au_id>
<au_lname>White</au_lname>
<au_fname>Johnson</au_fname>
<phone>408 496-7223</phone>
<city>Menlo Park</city>
<state>CA</state>
<zip>94025</zip>
<contract>1</contract>
</authors>
<authors>
<au_id>213-46-8915</au_id>
<au_lname>Green</au_lname>
<au_fname>Marjorie</au_fname>
<phone>415 986-7020</phone>
<city>Oakland</city>
<state>CA</state>
<zip>94618</zip>
<contract>1</contract>
</authors>
</ROOT>. . .

First, let’s just take our XML document and store it in a relational table. Since we’re using pubs already, we’ll take the data and store it in a new authorsXML table in pubs. We’ll simply accept the defaults and not fill in any optional parameters for the OPENXML function. The code will take our XML document using element-centric mapping, parse the document, and place it into the authorsXML table.

CREATE TABLE [authorsXML] (
[title] [varchar] (20),
[au_id] [varchar] (11)
) ON [PRIMARY]
GO
DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc ='
<ROOT>
<authors><au_id>172-32-1176</au_id><au_lname>White</au_lname>
<au_fname>Johnson</au_fname><title>book1</title>
<city>Menlo Park</city><state>CA</state><zip>94025</zip>
<contract>1</contract></authors>
<authors><au_id>213-46-8915</au_id><au_lname>Green</au_lname>
<au_fname>Marjorie</au_fname><title>book2</title>
<zip>94618</zip>
<contract>1</contract></authors>
</ROOT>'
--Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
-- Execute a SELECT statement that uses the OPENXML rowset provider.
INSERT AuthorsXML (title, au_id)
SELECT title, au_id
FROM OPENXML (@idoc, '/ROOT/authors',2)
WITH (au_id varchar(11),
au_lname varchar(40),
au_fname varchar(20),
title varchar(20),
phone char(12)
)
EXEC sp_xml_removedocument @idoc

If we tweaked the preceding statement and removed the INSERT and instead just did a SELECT on our data, such as SELECT *, SQL Server would return our parsed XML as a rowset. The results would look as follows:

Now, you may realize that we’re not storing some of the XML, such as the address, city, state, ZIP code, and contract values. If we wanted to, we could capture the XML document by creating another column and using the @mp:xmltext command in our schema definition, like this:

catchall nvarchar(1000) '@mp:xmltext'

The next example shows how to navigate an XML document using an XPath expression in OPENXML. Since OPENXML returns a relational rowset, you could actually join the results with another table and then store this rowset in your table. After calling OPENXML, your XML data can be treated just like any other relational data. Here, we’ll use the returned XML rowsets to join data with the publishers table to return only authors who have the same city as a publisher.

DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc ='
<ROOT>
<authors><au_id>172-32-1176</au_id><au_lname>White</au_lname>
<au_fname>Johnson</au_fname>
<city>Menlo Park</city><state>CA</state><zip>94025</zip>
<contract>1</contract>
<books>
<title>My book1</title>
<title>My book2</title>
</books>
</authors>
<authors><au_id>213-46-8915</au_id><au_lname>Green</au_lname>
<au_fname>Marjorie</au_fname>
<city>Boston</city><state>MA</state>
<zip>94618</zip><contract>1</contract>
<books>
<title>My book3</title>
<title>My book4</title>
</books>
</authors>
</ROOT>'
--Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
SELECT a.title, a.au_lname, p.pub_name, p.city
FROM OPENXML (@idoc, '/ROOT/authors/books',2)
WITH (title varchar(20) './title',
au_id varchar(11) '../au_id',
au_lname varchar(40) '../au_lname',
au_fnamevarchar(20) '../au_fname',
phone char(12) '../phone',
city varchar(20) '../city'
) AS a
INNER JOIN publishers AS p
ON a.city = p.city
EXEC sp_xml_removedocument @idoc

The results should look as follows:

The best way to use OPENXML is in a stored procedure, especially if you are taking your XML from the mid-tier and putting it into the database. Rather than parsing in the mid-tier, you can send your XML as text to the stored procedure and have the server parse and store it in a single operation. This provides a lot better performance and a lot less network traffic than parsing the XML in the mid-tier and sending T-SQL commands to the server to store the data.

If you are going to use your newly parsed XML over and over again, then rather than calling OPENXML multiple times, just store the results in a table variable. This will speed up the processing and free resources on the server for other work. The sample stored procedure that follows implements OPENXML. Notice the use of the new nvarchar(max) datatype. In SQL Server 2000, you would have to use a text datatype. For all new development, use the nvarchar(max) datatype, since the text datatype may be removed in future versions.

CREATE PROCEDURE update_authors_OPENXML (
@doc nvarchar(max))
AS
SET NOCOUNT ON
-- document handle:
DECLARE @idoc INT
--Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
-- Execute a SELECT statement that uses the OPENXML rowset provider.
INSERT AuthorsXML (title, au_id)
SELECT title, au_id
FROM OPENXML (@idoc, '/ROOT/authors/books',2)
WITH (title varchar(20) './title',
au_id varchar(11) '../au_id',
au_lname varchar(40) '../au_lname',
au_fname varchar(20) '../au_fname',
phone char(12) '../phone'
)
--Execute SPROC
EXEC update_authors_OPENXML '
<ROOT>
<authors><au_id>172-32-1176</au_id><au_lname>White</au_lname>
<au_fname>Johnson</au_fname><phone>408 496-7223</phone>
Park</city><state>CA</state><zip>94025</zip><contract>1</contract>
<books>
<title>My book1</title>
<title>My book2</title>
</books>
</authors>
<authors><au_id>213-46-8915</au_id><au_lname>Green</au_lname>
<au_fname>Marjorie</au_fname><phone>415 986-7020</phone>
<zip>94618</zip><contract>1</contract>
<books>
<title>My book3</title>
<title>My book4</title>
</books>
</authors>
</ROOT>'

XML Views Using Annotated XML Schemas

XML Schemas define the structure of an XML document, in the same way that a relational schema defines the structure of a relational database. With schemas, you can define what makes an XML document legal according to your specifications. For example, you can define the elements, attributes, hierarchy of elements, order of elements, datatypes of your elements and attributes, and any default values for your elements and attributes. Schemas are not required in your XML documents but are recommended, especially if you’ll be sharing your XML data with other applications that may not understand your XML data or how to correctly create that XML data without understanding your schema. The standard for schemas is XML Schema Definition (XSD).

With SQL Server, you can create an XML Schema that maps to your relational structure using some special schema markup. This is useful when you want to create an XML view of your underlying relational data. This view not only allows you to query your relational data into XML, but you can also persist changes using updategrams and SQLXML bulk-loading. It takes some work to create the annotated schema, but if you are going to be working extensively with XML, the extra work is worth the effort. Plus, you’ll want to use annotated schemas with updategrams, which you’ll learn about in the section “SQLXML Updategrams” later in this chapter.

Visual Studio includes a very capable XML Schema editor so that you don’t need to generate XML Schemas by hand. Following is a typical XML Schema for the authors XML that we were using previously. As you can see, the XML Schema is an XML document. The system knows it is a schema document because we declare a namespace, xs, that uses the XSD namespace. This namespace is a reference to the World Wide Web Consortium (W3C) XSD namespace, which is http://www.w3.org/2001/XMLSchema. This reference is aliased to xs and then all elements use this alias as their prefix inside of the schema.

Also, notice how the schema declares an element called AuthorsXMLNew, which contains the rest of your XML data. Then there is a complex type that declares a sequence of XML elements. These elements include the ID, first name, last name, phone, and so on, of the authors. Notice how the elements also declare a type. Schemas can define datatypes for your elements and attributes. We declare some strings, an unsigned int, and an unsigned byte. You can declare other datatypes beyond what this schema has, such as dates, Booleans, binary, and other types.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="AuthorsXMLNew">
<xs:complexType>
<xs:sequence>
<xs:element name="au_id" type="xs:string" />
<xs:element name="au_lname" type="xs:string" />
<xs:element name="au_fname" type="xs:string" />
<xs:element name="phone" type="xs:string" />
<xs:element name="city" type="xs:string" />
<xs:element name="state" type="xs:string" />
<xs:element name="zip" type="xs:unsignedInt" />
<xs:element name="contract" type="xs:unsignedByte" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Now that we have a base schema, if we want to use this annotated schema with SQLXML, we need to make some changes. First, we need to add a reference to the XML Schema mapping. To do this, we need to modify our XML Schema by first adding the namespace for SQL Server’s schema mapping, which is urn:schemas-microsoft-com:mapping-schema. This schema allows us to map our XML Schema to our relational database schema. We’ll alias this namespace with sql so that we can use the prefix sql: when we refer to it. Therefore, if we wanted to modify the preceding schema to support SQL Server mapping, we would use this new schema:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="XMLSchema1" targetNamespace="http://tempuri.org/XMLSchema1.xsd"
elementFormDefault="qualified"
xmlns="http://tempuri.org/XMLSchema1.xsd"
xmlns:mstns="http://tempuri.org/XMLSchema1.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
...

You’ll also see the use of the urn:schemas-microsoft-com:xml-sql namespace in documents. This namespace provides access to SQLXML functionality that can be used in templates or XPath queries.

Default and Explicit Mapping

You may notice that the preceding schema just adds the namespace for the mapping. The schema isn’t listed since SQL Server supports default mapping between your relational schema and your XML Schema. For example, the authors complex type would be automatically mapped to the authors table. The au_id string would automatically map to the au_id column, and so on.

You can also explicitly map between your schema and your SQL datatypes. For very simple applications, you can use the default mapping. In most cases, you’ll use explicit mapping since your XML and relational schemas may be different or you’ll want more control over how the mapping is performed or the datatypes are used. You use the sql:relation markup, which is part of the SQLXML mapping schema, to specify a mapping between an XML item and a SQL Server table. For columns, you use the sql:field markup. You can also include a sql:datatype to explicitly map your XML datatype to a SQL Server datatype so that implicit conversion doesn’t happen. Therefore, if we were to add these markups rather than using the default mapping for our schema, our schema would change to look like the following code:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xs:element name="AuthorsXMLNew" sql:relation="AuthorsXMLNew">
<xs:complexType>
<xs:sequence>
<xs:element name="au_id" type="xs:string" sql:field="au_id" />
<xs:element name="au_lname" type="xs:string" sql:field="au_lname" />
<xs:element name="au_fname" type="xs:string" sql:field="au_fname" />
<xs:element name="phone" type="xs:string" sql:field="phone" />
<xs:element name="city" type="xs:string" sql:field="city" />
<xs:element name="state" type="xs:string" sql:field="state" />
<xs:element name="zip" type="xs:unsignedInt" sql:field="zip" />
<xs:element name="contract" type="xs:unsignedByte"
sql:field="contract" sql:datatype="bit" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Relationships

Since in a relational database you can relate data by keys, you can use annotated schemas to describe those relationships in your XML. However, annotated schemas will make those relationships hierarchical through the use of the sql:relationship mapping. You can think of this as joining a table. The relationship mapping has a parent element that specifies the parent relation or table. It also has a parent-key element, which specifies the key to use, and this key can encompass multiple columns. Also, you have child and child-key elements to perform the same functionality for the child as the other elements do for the parent.

There is also inverse functionality, so you can flip this relationship. If for some reason your mapping is different from the primary key/foreign key relationship in the underlying table, the inverse attribute will flip this relationship. This is the case with updategrams, which you’ll learn about in the section “SQLXML Updategrams” later in this chapter. You’ll use this attribute only with updategrams.

Imagine we had our authors and the authors were related to books in our relational schema through the use of an author ID. We would change our schema mapping to understand that relationship by using the following XML Schema. Notice how the relationship mapping is in a special section of our XSD schema.

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="XMLSchema1" targetNamespace="http://tempuri.org/XMLSchema1.xsd"
elementFormDefault="qualified"
xmlns="http://tempuri.org/XMLSchema1.xsd"
xmlns:mstns="http://tempuri.org/XMLSchema1.xsd"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element name="Authors" sql:relation="Authors">
<xs:complexType>
<xs:sequence>
<xsd:element name="Books" sql:relation="Books">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship name="BookAuthors"
parent="Authors"
parent-key="au_id"
child="Books"
child-key="bk_id" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute name="bk_id" type="xsd:integer" />
<xsd:attribute name="au_id" type="xsd:string" />
</xsd:complexType>
</xsd:element>
<xs:element name="au_id" type="xs:string"
sql:field="au_id"></xs:element>
<xs:element name="au_lname" type="xs:string"
sql:field="au_lname"></xs:element>
<xs:element name="au_fname" type="xs:string"
sql:field="au_fname"></xs:element>
. . .
<xs:element name="contract" type="xs:boolean"
sql:field="contract"
sql:datatype="bit"></xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Key Column Mapping Using sql:key-fields

Now that you’ve seen how to build relationships, you also need to look at how to make SQL Server nest your XML data correctly. For nesting to correctly occur, you’ll want to specify the key columns used in your table that make the most sense when creating XML hierarchies. To give SQL hints on the correct ordering, use the sql:key-fields mapping, which tells SQL which columns contain key values. The sample that follows lets SQL Server know that the au_id column is a key column:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="XMLSchema1" targetNamespace="http://tempuri.org/XMLSchema1.xsd"
elementFormDefault="qualified"
xmlns="http://tempuri.org/XMLSchema1.xsd"
xmlns:mstns="http://tempuri.org/XMLSchema1.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element name="Authors" sql:relation="Authors"
sql:key-fields="au_id">
...

Excluding Data from the XML Result Using sql:mapped

Using the sql:mapped syntax, you can specify whether to map an element or attribute in your XSD schema to a database object. If you don’t want to have the default mapping occur and you don’t want to have the XML appear in your results, you should use the sql:mapped attribute.

There may be times when there is extraneous XML that you don’t want to appear in your table; for example, if you don’t control the XML Schema and want to omit the data from your table since a column for the data doesn’t exist in the table. This attribute has a Boolean value, with true meaning that mapping should occur and false meaning that mapping shouldn’t occur.

Creating a Constant Element

If you want an element to be constant in your XML document even if there is no mapping to the underlying database, you should use the sql:is-constant mapping. This mapping is Boolean, and a value of true makes the element always appear in your XML document. This mapping is very useful for creating a root element for your XML. The following is an example of using this mapping:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="XMLSchema1"
targetNamespace="http://tempuri.org/XMLSchema1.xsd" ...>
<xs:element name="Root" sql:is-constant="true">
...

Limiting Values by Using a Filter

You may want to filter the results returned to your XML document by values from your database. The sql:limit-field and sql:limit-value mappings let you do this by allowing you to specify a filter column and the value to limit that column by. You don’t need to specify the limit value if you don’t want to since SQL Server will default this to null. You can also have multiple limiting values for multiple mappings. The shortened example that follows shows a schema that limits authors who live in Boston:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="XMLSchema1"
targetNamespace="http://tempuri.org/XMLSchema1.xsd" ...>
<xs:element name="Root" sql:is-constant="true">
...
<xs:element name="Authors"
sql:relation="Authors"
sql:limit-field="city"
sql:limit-value="Boston">
...

Other Features in Schema Mapping

The majority of your mapped schemas will use the preceding mappings. Table briefly summarizes the other mapping technologies. To see annotated schemas in action, take a look at the integrated example in the “the “Programming SQLXML from .NET and COM” section later in this chapter.

Other Schema Mapping Features

SQL XML Updategrams

So far, you’ve seen how to shred data using OPENXML and how to get data out of SQL Server using an annotated schema. Updategrams build upon the annotated schema concept. Updategrams allow you to change data in SQL Server using an XML format. Rather than writing T-SQL, you specify your changes to your data using before-and-after images specified in an XML format. You can execute these updategrams from ADO or ADO.NET as you’ll see in the full example in the “Programming SQLXML from .NET and COM” section later in this chapter.

The first step towards understanding XML updategrams is to understand the namespace they use, namely urn:schemas-microsoft-com:xml-updategram. This namespace is usually abbreviated to updg as part of your namespace declaration.

Every updategram must contain at least one sync element, which is an XML element that contains the data you want to change in the form of before and after elements. You can have multiple sync elements, and each element is considered a transaction, which means that everything in that sync block is either completely committed or entirely rolled back. The before element contains the data as it was before the change. You’ll want to specify a key so that SQL Server can find the data that you want to change. You can modify only one row in your before element.

The after element is the changed data. You can imagine that an insertion will have an after but no before. On the other hand, a delete will have a before but no after. Finally, an update will have both a before and an after.

The following is an updategram in its simplest form:

<ROOT xmlns:updg="urn:schemas-microsoft-com:xml-updategram">
<updg:sync [mapping-schema= "AnnotatedSchemaFile.xml"] >
<updg:before>
...
</updg:before>
<updg:after>
...
</updg:after>
</updg:sync>
</ROOT>

You’ll notice that you can optionally specify an annotated schema file that will map explicitly the elements in your updategram to columns in your tables. If you don’t specify an annotated schema file, SQL Server will use default mapping, as you saw in the annotated schema mapping section. It is also important to note that you can mix and match element- or attribute-based mapping. However, for the sake of clarity, our recommendation is to select one style or the other.

To specify a null value with an updategram, you use the sync element’s nullvalue attribute to specify the placeholder for the null. For example, if you wanted the value of "nothing" to be null, you would use the following updategram, which uses attribute-based syntax:

<?xml version="1.0"?>
<authorsupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync updg:nullvalue="nothing">
<updg:before>
<Authors au_id="172-32-1176"/>
</updg:before>
<updg:after>
<Authors state="nothing" phone="nothing"/>
</updg:after>
</updg:sync>
</authorsupdate>

You can also use parameters with your updategrams by specifying $parametername. For example, if you wanted to create a parameter for the selection of the author, you would use the following updategram: <?xml version="1.0"?> <authorsupdate xmlns:updg= "urn:schemas-microsoft-com:xml-updategram"> <updg:sync updg:nullvalue="nothing"> <updg:before> <Authors au_id="$AuthorID"/>
</updg:before>
<updg:after>
<Authors state="nothing" phone="nothing"/>
</updg:after>
</updg:sync>
</authorsupdate>

If you want to use identity columns and you want to pass the identity values between tables, you can use the at-identity attribute. This attribute is a placeholder that you can include, and SQL Server will provide the correct value for it when processed. If you want to pass the identity value back to the client, you can use the returnid attribute. SQL Server will then return an XML document containing the identity value after the updategram is applied successfully to the server.

An example will make this all clearer. If we wanted to insert a new author into our authors table, delete an existing author, and change the values for yet another author, we would use the following updategram against our authors table. The next section shows how to program in .NET using the SQLXML classes to execute this code.

<?xml version="1.0"?>
<authorsupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync updg:nullvalue="nothing">
<updg:before>
</updg:before>
<updg:after>
<Authors au_id="123-22-1232" au_fname="Tom" state="WA" phone="425-882-8080"/>
</updg:after>
<updg:before>
<Authors au_id="267-41-2394"/>
</updg:before>
<updg:after>
</updg:after>
<updg:before>
<Authors au_id="238-95-7766"/>
</updg:before>
<updg:after>
<Authors city="Oakland" phone="212-555-1212"/>
</updg:after>
</updg:sync>
</authorsupdate>

If you want to load a large set of XML data into SQL Server, you’ll want to use the XML bulk-loading
capabilities of SQLXML. Don’t—we repeat don’t—use updategrams or OPENXML. You’ll find performance lacking with these two components for loading large amounts of XML data. Of course, you may be wondering what makes up a large amount of XML data. Well, it depends on a number of factorssuch as the size and complexity of your XML. You could be loading hundreds of small XML files, or youcould be loading one big XML file. If you have fast processors, a lot of memory, and fast disks on your server, you could possibly get away with using OPENXML. Our recommendation is to run a test on yoursystem to see which method performs acceptably with the data volume that you intend to run.

XML BulkLoad leverages the mapping schema technology that we’ve been talking about in this chapter. The mapping schema will tell the BulkLoad component where to place your XML data in the database. The object model for XML BulkLoad is very straightforward. There is one method called Execute and a lot of properties that allow you to configure how to handle the bulk-loading, as shown in Table. The Execute method takes two parameters. The first is the path to the schema mapping file. The second optional parameter is a path or stream to the XML file you want to import.

The first property that you should understand is the Transaction Boolean property. Normally, you want to leave this property false to make the load nontransacted. This will increase your performance at the cost of not being able to roll back if there is a failure.

The next property is the XMLFragment Boolean property. If you set this to true, BulkLoad allows XML fragments, which are XML documents with no root element.

If you are working with constraints and you want those constraints enforced as part of your BulkLoad, you’ll want to set the CheckConstraints property to true. By default, BulkLoad turns off constraint checking, which improves performance. Regardless of whether you set this to true or false, you’ll want to place primary keys ahead of a table with a foreign key in your mapping schema.

If you want to ignore duplicate keys, you need to set the IgnoreDuplicateKeys Boolean property to true. This is useful if you get data feeds where the person providing the data feed may not know what data is in your database and you don’t want the bulk-loading operation to fail because of duplicate keys. BulkLoad will not commit the row with the duplicate key, but instead just jump over that row in processing.

Many database designers use identity columns to guarantee uniqueness of keys in the table. Sometimes the XML you are loading has an identity-like element that you may want to use rather than having SQL Server generate a value using its own algorithm. To do this, set the KeepIdentity property to true. This is the default value for this property. One thing to remember is that it is a global value, so you cannot have SQL Server generate some identities and have BulkLoad pull from the XML for others.

The KeepNulls property defaults to false with BulkLoad. BulkLoad will not automatically insert null as the value for any column that is missing a corresponding attribute or element in the XML document. If you set this property to true, you must be careful here since BulkLoad will fail if you don’t allow nulls in those columns. BulkLoad will not assign the default value for a column, if one is specified in SQL Server, if the property is true.

One interesting BulkLoad property is ForceTableLock, which locks the table as BulkLoad loads its data. This will speed performance of the load at the cost of locking other users out of the table. The default value is false, so BulkLoad acquires a table lock each time it inserts a record into the table.

If your target tables don’t already exist, BulkLoad can create the tables for you. You need to set the SchemaGen property to true to have BulkLoad perform this functionality. BulkLoad will take the datatypes from your schema mapping and autogenerate the correct database schema based on those datatypes. If a table or column already exists with the same name and you want to drop and re-create them, set the SGDropTables property to true.

The next section shows using BulkLoad from a managed environment. BulkLoad supports both COM and .NET so you can program from both environments with this technology.