Using the XML Datatype - SQL Server 2008

SQL Server has an XML datatype you can use to natively store XML data in SQL Server databases. If you are still using SQL Server 2000, you can store XML, but it must be in a string-based column, or you must shred the data into relational columns using OPENXML or BulkLoad, as you saw in the previous chapter. By using a native XML type, SQL Server 2005 and 2008 can support richer operations against your XML data, such as constraints, cross-domain queries that combine relational data and XQuery, and XML indexes.

Another benefit of using the native XML datatype is that XML data is inherently different from relational data in its structure. XML data is in a hierarchical structure that can be recursive, and XML supports a different query language than relational systems.

There are many scenarios where using relational modeling is a better choice than XML, and vice versa. For example, if you have data that is very interrelated, such as customers, their orders, the products in the orders, and the stores that sell those products, you could try to implement a solution using XML, but it would be quite challenging. How do you structure your hierarchy? Do you want a customer to be a top-level node and then have orders for each customer appear underneath? How do you write a query that returns all customers with at least five orders, where each order is greater than $1,000, and the name of the store where the customers purchased the products? Another problem is that you will repeat data throughout the hierarchy, such as product names, product prices, and so on, because of the hierarchical nature of XML. Plus, if you want to delete a customer but not the products or orders under that customer, you can’t do so, because the orders and products are children under the customer element. On the other hand, using a relational model as a solution, you can quickly model your data and query the information.

You may be thinking that in this scenario, you should just shred your XML data into the relational database, as you saw in the previous chapter. However, shredding has its own issues, in that you do not always get back what you put in, since you are not guaranteed the same XML when you reconstitute the shredded data. Shredding adds another layer of complexity in terms of code creation and maintenance. Also, any reasonably complex XML document will need to be shredded across many tables, requiring extensive join operations across those tables to reconstitute the XML. You’ll also end up with a complex, annotated schema full of tables and many foreign key relations into which to shred that XML.

Now, there are scenarios where modeling your data using XML is very useful. First, XML can be more flexible than relational models. So, if you need a free-form structure to store data, XML can be a good choice. Also, XML is self-describing and easily portable across applications or even platforms. Plus, if your data has sparse entries or needs rich multivalue functionality, XML is a good choice as your data format. Finally, if you truly have document-centric data such as Microsoft Office documents, you will want to store this information as XML, since Microsoft Office documents lack rigid structures. XML provides the flexibility to store and query the information in the documents in a rich way.

Even if you choose XML as the format for your data, you will need to decide between using the XML datatype, shredding your XML into relational columns, and storing the XML using the (n)varchar(max) or varbinary(max) type. If you care about the order of elements, and you want the ability to use XML programming paradigms such as XPath and XQuery, you will want to use the XML datatype to store your XML data. If your XML data is best represented using a relational model, you can shred your data into relational columns using annotated schemas, just as you could in SQL Server 2000. Finally, if you need to preserve the XML data exactly as it was created, including whitespace and attribute ordering, then you will want to store the XML in an (n)varchar(max) or a varbinary(max) column. Some scenarios (such as legal documents) may require this.

Finally, SQL Server can support a hybrid model, whereby you may use the XML datatype but promote certain properties—for example, key document properties such as author, last modification time, or last save time—into relational columns, or you may shred your XML into relational columns but keep the original copy in an nvarchar column. SQL Server provides the flexibility to meet the needs of your application when working with XML data.

We want to make one thing very clear, though, since this will cause you issues in the long term if you do not remind yourself of it regularly: if your data is quite structured, in that your XML does not look hierarchical and is normalized, you should use the relational model. Do not use XML. XML is targeted at semistructured or unstructured data. If you need to dynamically add schemas or data on the fly that you never expected, XML is your best choice. Do not make the mistake of thinking everything is a nail to bang with the XML hammer in SQL Server.

Understanding How XML Is Stored by SQL Server

Before we discuss how to create a column of type XML, let’s first look at how SQL Server stores XML. You may be wondering how, under the covers, SQL Server translates XML into something that is performant when running queries against the XML data. One thing we can guarantee is that XML is not stored as text!

When you create a column using the XML datatype, SQL Server takes the XML and converts it into a binary XML format. One reason for this is that it’s faster to index and search binary data than plain text data. A second reason is that you can compress binary data much more easily than plain text data. SQL Server will tokenize the XML and strip out portions of the markup. If you look at many XML documents, you can see that they have redundant text throughout for element or attribute markup. With the XML datatype, this redundancy can be removed, and your data can be compressed.

The XML datatype is implemented using the varbinary(max) datatype under the covers to store the binary XML. If the converted binary XML is small enough to fit in the row, SQL Server stores the binary XML in the row. If the XML is too large, a 24-byte pointer that points to the binary XML is left in the row. With the built-in compression, you should expect an average of 20 percent compression of the XML when storing it in the XML datatype. Of course, this will depend on the number of tags you have in the document and the redundancy of your text. As you will see, using typed XML is preferable to untyped. When types are specified, you can get better performance and compression of the XML datatype because SQL Server does not need to do type conversions and can parse data faster.

If you ever want to see how much compression is achieved between storing your XML using nvarchar and using the XML datatype, you can use the DATALENGTH function. The following example compares using nvarchar and the XML datatype with the XML we use as our sample XML in this chapter:

select DATALENGTH(N'<?xml version="1.0" standalone="yes"?>
<people>
<person>
<name>
<givenName>Martin</givenName>
<familyName>Smith</familyName>
</name>
. . .
') as NVARLEN,
DATALENGTH(CAST(N'<?xml version="1.0" standalone="yes"?>
<people>
<person>
<name>
<givenName>Martin</givenName>
<familyName>Smith</familyName>
</name>
. . .
' AS XML)) as XMLLEN
Results:
NVARLEN: 1154
XMLLEN: 324

As you can see, we save about 3.5 times the space using the XML datatype. The reason for this is that many tags in this XML repeat, and the XML datatype can strip these tags when it stores the XML data in SQL Server. Depending on the redundancy in your XML, you should find similar savings in size.

Creating XML Columns

The following code creates a new table that contains a standard relational primary key column as well as an XML column. This example uses untyped XML:

CREATE TABLE xmltbl (pk INT IDENTITY PRIMARY KEY, xmlCol XML not null)

You can have multiple XML columns in a single table. One thing you will notice is that there is no XML schema associated with the XML column.

SQL Server supports both untyped and typed XML columns. Untyped columns have no schema associated with them. Typed columns have XML Schemas to which the XML documents inserted into the column must conform. Whenever possible, you will want to associate XML Schemas with your XML columns, so that SQL Server will validate your XML data, make better optimizations when querying or modifying your data, perform better type checking, and optimize the storage of your XML data.

As you saw earlier, SQL Server stores XML data in a proprietary binary format for speed and compression. With an index, the server can find the information more quickly, but there is a bit of a performance hit when you insert your data.

Defining Typed XML Columns

To create a typed XML column, you need to load your schema into SQL Server and then associate it with the column in question. Once you’ve done this, only documents that adhere to your schema can be inserted into the table. You can have one or many schemas associated with an XML column. The following code creates a new table that uses a schema on an XML datatype, so it is a typed XML column:

-- Create a new database for the samples
USE master
DROP DATABASE xmldb
IF NOT EXISTS (SELECT * FROM sys.databases WHERE name = 'xmldb')
CREATE DATABASE xmldb
GO
--Declare the variable for the XML
DECLARE @x XML
-- Open the XSD schema
SELECT @x = s
FROM OPENROWSET (
BULK 'C:Customer.xsd',
SINGLE_BLOB) AS TEMP(s)
select @x
-- Make sure the schema does not exist already
IF EXISTS(select * from sys.xml_schema_collections where name='Customer')
DROP XML SCHEMA COLLECTION Customer
-- Create the schema in the schema collection for the database
CREATE XML SCHEMA COLLECTION Customer AS @x
GO
-- Create a table that uses our schema on an XML datatype
CREATE TABLE xmltbl2 (pk INT IDENTITY PRIMARY KEY,
xmlColWithSchema XML (CONTENT Customer))
GO

First, you need to load your XML Schema into SQL Server. The code uses OPENROWSET to open the XML Schema file stored in the file system. The code assigns the schema to the variable x. Next, the code drops the schema if it exists. Here, you will see the use of the dynamic management views for XML, which we cover later in this chapter. SQL Server includes views for querying the schema collections, schema namespaces, and XML indexes.

If the schema collection does not exist, the code creates the schema collection in SQL Server. Schema collections are scoped to the database where they are created. Schema collections cannot span databases or instances, so you may need to create the same schema in multiple locations if you use the same schema for multiple, different XML datatypes in different databases. You can have more than one XML Schema in your database. In addition, you can assign more than one XML Schema to a column that uses the XML datatype.

One caveat with schemas is that once you create a schema, you cannot modify or drop it until all references to it are removed from the system. For example, if an XML column in your table references a schema, you will not be able to modify that schema. SQL Server will return a message stating that the schema is in use and will include the name of the components using the schema.

Your schema is loaded into SQL Server’s metadata and can be viewed using the sys.xml_schema_collections metadata view. If you want to retrieve the schema after you load it into the system, you will need to use the xml_schema_namespace function. This function takes two parameters: the first is the relational schema in which your XML Schema was created, and the second is the name of the schema you want to retrieve. The following code retrieves the Customer schema created in the previous example:

--Return the schema
USE xmldb
go
SELECT xml_schema_namespace(N'dbo',N'Customer')
go
Here is the returned XML Schema from this call:
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema
xmlns:t="urn:example/customer" targetNamespace="urn:example/customer"
elementFormDefault="qualified">
<xsd:element name="NewDataSet">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="t:doc" />
</xsd:choice>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="doc">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:sequence>
<xsd:element name="customer" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string" minOccurs="0" />
<xsd:element name="order" minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:sequence />
<xsd:attribute name="id" type="xsd:string" />
<xsd:attribute name="year" type="xsd:string" />
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="notes" minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="xsd:anyType">
<xsd:sequence>
<xsd:element name="buys" type="xsd:string"
minOccurs="0" />
<xsd:element name="saleslead" type="xsd:string"
minOccurs="0" />
<xsd:element name="competitor" type="xsd:string"
minOccurs="0" />
</xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:string" />
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:string" />
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
</xsd:schema>

If you compare the returned XML Schema and the original file included with the sample code, you will find that they are different. SQL Server does not guarantee that it will return the same exact XML Schema document as you submitted, since it translates your schema into the server metadata catalog. For example, comments, annotations, and whitespace are removed, and implicit types are made explicit. If you need to keep a copy of your exact schema document, you should store it in a string column, an xml column, or the file system.

SQL Server defines certain schemas by default; these are common schemas that you may want to use in your XML Schemas in addition to your custom schema definitions. The following are the reserved XML Schemas with their prefixes. Note that you cannot create schemas with the same name as the existing predefined schemas in SQL Server.

xml = http://www.w3.org/XML/1998/namespace
xs = http://www.w3.org/2001/XMLSchema
xsi = http://www.w3.org/2001/XMLSchema-instance
fn = http://www.w3.org/2004/07/xpath-functions
sqltypes = http://schemas.microsoft.com/sqlserver/2004/sqltypes
xdt = http://www.w3.org/2004/07/xpath-datatypes
(no prefix) = urn:schemas-microsoft-com:xml-sql
(no prefix) = http://schemas.microsoft.com/sqlserver/2004/SOAP

An interesting predefined schema is sqltypes. This schema allows you to map your XML data to SQL types such as varchar.

You cannot modify these built-in schemas, nor can you serialize these schemas. You can use the import namespace directive to import these schemas into your own XML Schema, however, and then use the schema in your own custom schema declarations.

As we mentioned earlier, you cannot modify an existing XML schema. You can add new schemas, but then you will need to go through and modify your XML instances to use the new schema. You can drop your schemas and create new schemas, but that will untype your XML, which could be a long operation depending on the number of XML instances contained in your table.

Defining XML Columns Using a GUI

You’ve seen how to create columns using the XML datatype through code. However, SQL Server Management Studio allows you to work with XML in many ways. Figure shows creating a new column of type XML. In the Properties area, you can specify the schema you want to associate with the column, if any. Beyond that, all dialog boxes that take types, such as the new stored procedure header dialog box, will take XML as a type since it is a native type in SQL Server.

Working with the XML datatype in Management Studio

Working with the XML datatype in Management Studio

Setting Permissions for Schema Creation

The code you walked through earlier for schema creation assumed that you already had permissions to create XML Schemas in the server. However, that may not be the case. Since an XML Schema is like other objects in a SQL Server database, you can set permissions on schema collections. One thing to note is that users need both permissions on the relational schema and explicit permissions for XML Schema creation, since XML Schema creation modifies the relational schema as well. The different types of permissions you can grant and their effects are discussed next.

To create an XML Schema, you need to be granted the CREATE XML SCHEMA COLLECTION permission. You need to grant this permission at the database level.

When you set the ALTER permission, users can modify the contents of an existing XML Schema collection using the ALTER XML SCHEMA COLLECTION statement. Remember that users need permissions on the relational schema as well for this to work.

The CONTROL permission allows users to perform any operation on the XML Schema collection. This means that users can create, delete, or edit the schema information. To transfer ownership of XML Schemas from one user to another, you would set the TAKE OWNERSHIP permission.

To use constraints or for typing your XML datatype columns, you would add the REFERENCE permission to users. The REFERENCE permission is also required when one XML Schema collection refers to another.

The VIEW DEFINITION permission allows users to query the contents of an XML Schema collection either through XML_SCHEMA_NAMESPACE or through the XML dynamic management views. Usersneed to also have ALTER, CONTROL, or REFERENCES permission.

To perform validation against schemas, the EXECUTE permission is required. Users also need this permission when querying the XML stored in columns and variables that are typed. The following code shows how to grant permissions for a user to alter the relational and XML Schema inside a database:

-- Grant permissions on the relational schema in the database
GRANT ALTER ON SCHEMA::dbo TO User1
GO
-- Grant permission to create XML schema collections in the database
GRANT CREATE XML SCHEMA COLLECTION
TO User1
GO

Constraining XML Columns

You can use relational constraints on XML columns. There may be times when you will keep your XML data untyped and use constraints instead; for example, if the constraint you want to use is not easily expressed using XML Schemas, such as executing an XPath statement. One example of this may be a constraint that makes sure that the order amount is not discounted by more than a certain amount, which is calculated dynamically, depending on the total cost of the order. Bigger order amounts may get larger discounts, but cannot exceed a certain sliding scale of percentages for discounts.

Another reason to use constraints is if you need to constrain the column based on other columns’ values. For example, you may want to make sure that another value in another column is filled in before allowing users to insert data into your XML column.

There are limitations to using XML columns, though. For instance, XML columns cannot be primary keys or foreign keys, nor can they have unique constraints. However, they can be included in a clustered index or explicitly added to a nonclustered index by using the INCLUDE keyword when the nonclustered index is created.

The following example creates a table with an XML column that has a constraint.

CREATE FUNCTION CheckCustomerName(@x xml)
RETURNS bit
AS
BEGIN
RETURN @x.exist('declare namespace
cust="urn:example/customer";
/cust:doc/cust:customer/cust:name')
END;
GO
--Create a constraint
CREATE TABLE xmltbl3 (pk int IDENTITY PRIMARY KEY,
xmlColWithConstraint XML CHECK(dbo.CheckCustomerName(xmlColWithConstraint) = 1))
GO

The constraint checks to see if the XML being inserted has a customer name. You could also create constraints against other columns, different datatypes, or information contained in the XML itself. Once you create your table with the constraint, you can then try to insert data into the table. If you insert data that does not meet the constraint, you’ll get the following error message:

Msg 547, Level 16, State 0, Line 1
INSERT statement conflicted with CHECK constraint 'CK__xmltbl3__xmlColW__0BC6C43E'.
The conflict occurred in database 'xmldb',
table 'xmltbl3', column 'xmlColWithConstraint'.
The statement has been terminated.

Examining the XML Datatype Limitations

There are a number of limitations you should be aware of with the XML datatype. First, the XML datatype is limited to 2GB in size and 128 levels of hierarchy. Furthermore, the XML datatype cannot be compared, sorted, grouped by, part of a view, part of a clustered or nonclustered index, part of a distributed partitioned view, or part of an indexed view. Some of these limitations are due to the fact that XML is not relational data or that the XML datatype has its own way to create indexes or sort its data. Even given these limitations, you should be able to store the majority of your XML data in SQL Server. If you have data that exceeds these limitations, especially the 2GB size limit, we’d love to learn more about that dataset!

SQL Server stores your XML data using UTF-16 by default. If you need to use a different encoding, you will cast the results or perform a conversion in your application. For example, you could convert your XML result to a different SQL type such as nvarchar. However, you cannot convert the XML datatype to text or ntext. Your XML data will use the collation settings on your server for the result set.

XML data employs the same locking mechanism as SQL Server. This means the XML datatype supports row-level locking. This is true whether you are using T-SQL or XQuery. Therefore, if the row gets locked, all XML instances in that row are locked as well.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics