# Querying XML Data - SQL Server 2008

Once you get your XML data into SQL Server, you may want to get it back out. The XML datatype provides four methods for you to do this: query(), value(), exist(), and nodes(). Another method on the XML datatype is modify(), which we discuss later in the section “Modifying XML Data.” These methods are based on the use of the XQuery language, so we’ll start with a quick XQuery tutorial. Then we’ll investigate how to use these methods to query XML data.

XQuery 101

If you currently use XPath or XSLT, XQuery should not be entirely unfamiliar to you. You’re used to iterating over hierarchies of data and the semantics of the XPath language. However, if you do only relational work with T-SQL, XQuery may look like a strange new beast. The reason for this is that T-SQL works on rows and columns, not hierarchies of information. XML data is structured differently from relational data, and it will take a bit of time for relational developers to get used to XQuery or even XPath syntax. That should not stop you from starting to learn these different languages, though, since more and more information will be stored in XML over the coming years.

SQL Server 2000 supported XML shredding and XPath 1.0 expressions, and you saw some examples of these technologies in the previous chapter. However, XPath gets you only so far in working with your XML data. It lacks the ability to perform more complex operations on your XML data, as it does not support recursive XML schemas and projection and transformation syntax. XQuery is the big brother of XPath, in that the simplest form of an XQuery is an XPath expression.

Before we dive into XQuery, though, let’s take a look at an XPath statement just in case you’ve never used XPath before or didn’t read the previous chapter. We’ll work with the following sample XML document. You can follow along with the samples in this chapter by opening the file XMLSample.sql included with the sample downloads for this book, which you can find in the Source Code/Download area of the Apress web site (http://www.apress.com).

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<people>
<person>
<name>
<givenName>Martin</givenName>
<familyName>Smith</familyName>
</name>
<age>33</age>
<height>short</height>
</person>
<person>
<name>
<givenName>Stacy</givenName>
<familyName>Eckstein</familyName>
</name>
<age>40</age>
<height>short</height>
</person>
<person>
<name>
<givenName>Tom</givenName>
<familyName>Rizzo</familyName>
</name>
<age>30</age>
<height>medium</height>
</person>
</people>

To retrieve all the names for all the people, you execute the following XPath statement:

/people//name

To move this XPath query to an XQuery, you put a curly brace around the query, which tells the XQuery processor that the query is an XQuery expression and not a string literal.

FLWOR

While you can use XPath expressions in XQuery, the real power of XQuery is through FLWOR. FLWOR stands for For-Let-Where-Order By-Return. FLWOR is similar to T-SQL’s SELECT, FROM, and WHERE statements, in that you can conditionally return information based on criteria that you set. However, instead of returning relational data like T-SQL, XQuery returns XML data. Let’s look in more detail at each of the parts of FLWOR.

• The For expression lets you specify the XML that you want to iterate over. The way you specify your XML is by using an XPath expression. Normally, you surround your entire XQuery statement with any beginning or ending XML tags that you want. This will depend on which nodes you iterate in your document and the desired structure of the returned XML document. You can think of this expression as being similar to combining the SELECT and FROM T-SQL statements.
• The Let expression allows you to set a variable to an expression, but does not iterate over the values.
• The Where expression is similar to the T-SQL WHERE clause. This expression evaluates to a Boolean. Any value that returns True passes through, and any value that returns False is rejected.
• The Order By expression is similar to the SQL ORDER BY clause. This expression allows you to sort your result sets using the sort order you specify.
• The Return expression specifies the content that should be returned.

The following is a more complex example of an XQuery statement. The example iterates over the XML contained in a table and returns only the given name of people who have an age element in the XML document.

SELECT people.query(
'for $p in //people where$p//age
return
<person>
<name>{$p//givenName}</name> </person> ' ) FROM xmltblnew The following XQuery counts the number of person elements in an XML document. It also shows how to use the query method of the XML datatype with XQuery. SELECT people.query( 'count(//person) ') FROM XMLtblnew go Beyond FLWOR, XQuery supports functions such as avg, min, max, ceiling, floor, and round. The following example shows how to calculate the rounded average age of people from an XML document: SELECT people.query( 'round(avg(//age)) ') FROM XMLtblnew go XQuery has string functions such as substring, string-length, and contains. It also has date/time functions such as dateTime-equal, dateTime-less-than, dateTime-greater-than, and individual date and time functions. The example that follows shows some of these functions in use. First, you get all the nodes in the XML document under the people node that have an age. Then you return the givenName element for each person. If you wanted to return only the data for this element, you could use the data function. Instead, you want to return the full element for the givenName. Next, if there is a match on a particular name, such as Martin, you want to return True; otherwise, you want to return False. Finally, the code figures out the maximum age of all people by using the max function. SELECT people.query( 'for$c in (/people)
where $c//age return <customers> <name> {$c//givenName}
</name>
<match>{contains($c,"Martin")}</match> <maxage>{max($c//age)}</maxage>
</customers>
')
FROM xmltblnew
Go

Table lists the functions you can use against the XML datatype when using XQuery.

XML Datatype XQuery Functions

XQuery in More Depth

If you break apart the XQuery statements shown earlier, XQuery contains a prolog and a body. The prolog contains declarations and definitions that set up the necessary environment to run the body. This setup could include declaring namespaces in the prolog. The body consists of a series of XQuery expressions that will get you your desired result.

To declare a namespace in the prolog, you have two choices: put an in-line declaration in the prolog or use the WITH XMLNAMESPACES keyword. To improve readability of your queries, you will want to use the WITH XMLNAMESPACES approach, since declaring namespaces in-line can make a query very hard to understand and read. For example, take a look at the following two queries. You can decide for yourself which one is easier to understand, but most likely you will agree that it’s the second one.

SELECT people.query(
'declare namespace peopleschema="urn:example/people";
round(avg(//peopleschema:age))
')
FROM XMLtblnew
GO
WITH XMLNAMESPACES('urn:example/people' AS peopleschema)
SELECT people.query(
'round(avg(//peopleschema:age))
')
FROM XMLtblnew
GO

As part of the prolog, you can also declare a default namespace. You can then omit element namespace prefixes on all your elements in that namespace. This is useful if you have only one namespace for all elements in your XML document. You would use the declare default element namespace syntax if you declared your namespace in the prolog, or you would use the WITH XMLNAMESPACES (DEFAULT'yournamespace') syntax if you declared the namespace outside the prolog.Note that you can use the WITH XMLNAMESPACES syntax with other operations besides XQuery operations. You can also use this syntax with FOR XML if you want to add a namespace to your FORXML-rendered XML documents.

Once you begin working with the body of your XQuery, you will start using the literals and operators we discussed previously, such as sum, min, max, for, order by, path expressions, sequence expressions, and so on.

Always remember that comparison operators are XML-encoded, which means less than (<) is &lt;, greater than (>) is &gt;, and so forth.

Also note that you can use the built-in types such as xs:string and xs:date in your XQueries.Casting is supported using the CAST AS operator for explicit casting, but XQuery also supports implicit casting for types it can coerce to be other horizontal types. Numeric types are supported only when doing implicit casting.

XQuery Processing in SQL Server

Because of its modular architecture, SQL Server can leverage the existing query execution and optimization technologies to process XQuery. Rather than building a separate query execution pipelineor optimization infrastructure, SQL Server leverages the same infrastructure for XQuery that it does for T-SQL. Because of this implementation, the level of integration between XQuery and T-SQL is superior. This is what allows you to do rich cross-domain queries between the two languages. For example, when SQL Server encounters an XQuery statement, the XQuery parser is called. The results of the XQuery parser, which is a query tree, are grafted onto any existing query to generate the entire query tree, which may contain both relational and XQuery queries. In effect, the T-SQL query and the XQuery are combined into a single query. The entire query is then passed to the optimizer, and an XML Showplan is created that includes the XQuery operations, as shown in Figure.

Execution Showplan with an XQuery

Basic XML Query Methods

Let’s now examine the four methods for querying XML datatypes:

• query(): Returns the XML that matches your query
• value(): Returns a scalar value from your XML
• exist(): Checks for the existence of the specified XML in your XML datatype
• nodes(): Returns a rowset representation of your XML

query()

The query() method takes an XQuery statement and returns the XML that matches the query. The XML that is returned is untyped XML and can be further parsed if needed. To see how this works, create a new table, xmltblnew, to store the XML. You’ll use this table to learn how to query and modify XML data.

-- Create a new table
CREATE TABLE xmltblnew (pk INT IDENTITY PRIMARY KEY, people XML)
GO
--Insert data into the new table
INSERT INTO xmltblnew (people)
SELECT *
FROM OPENROWSET (
BULK 'C:peopleXML.xml',
SINGLE_BLOB) AS TEMP
GO

The following example uses the query() method to look for all people who have an age, and then returns XML that identifies each person by name:

SELECT people.query(
'for $p in //people WHERE$p//age
return
<person>
<name>{\$p//givenName}</name>
</person>
'
)
FROM xmltblnew

As you look at the preceding XQuery, you will notice that it maps somewhat to T-SQL in the following ways:

• SELECT is equivalent to RETURN.
• FROM is equivalent to FOR.
• WHERE is equivalent to WHERE.

The interesting part is that the semantics of the query are different. Rather than using relational types of operations, you use hierarchical path operations. The XQuery—and for that matter, XPath syntax—takes some getting used to. If you haven’t thought in a hierarchical, path-based way before, you should start out with simple queries to learn the language. Once you progress from the simple queries, you will see that XQuery can be almost as complex in terms of functionality as T-SQL.

value()

The value() method returns a scalar value back from the XML instance. This method takes two arguments: the first is the XQuery you want to use to get the value, and the second is the T-SQL type that you want that value converted to. You cannot convert to a timestamp or the NTEXT, TEXT, or IMAGE types. Also, you can’t convert to an XML or a sql_variant datatype.

You will want to try and match types between XML and SQL. For example, a string value in XML can be converted to any T-SQL string type. Numbers can be converted to numeric types. In addition, string values that can be coerced into numbers can be converted into numeric or money types. Date, time, or string values that can be valid datetime values can be converted into datetime types.

The following code snippet gets the age of the second person in the XML document and returns it as an integer:

SELECT people.value('/people[1]/person[2]/age[1][text()]', 'integer')
as age FROM XMLtblnew
GO

As a quick aside, try running the following code and note the error you get:

SELECT people.value('/people[1]/person[2]/age[1][text()]', 'integer')
as age FROM XMLtblnew
GO

You will get an archaic error telling you that the value function requires a singleton or an empty sequence and not an operand of type xdt:untypedAtomic. You may be looking at this and thinking, “But I return the text of the first age element, which is a singleton!” Well, when SQL Server goes through its evaluation, it looks at the entire expression to see if any part can return more than a single node. In the code that returns an error, the people path in the expression does not have a position predicate, so it could possibly return more than a single node. The easiest way to fix this common error is to make sure you use position predicates in your expressions.

exist()

The exist() method is used to check for the existence of an XML datatype. This method returns 1 if the XQuery expression returns a non-null result. Otherwise, this method will return 0 if the condition is not met or if the result set is null. The following example returns the columns in the SELECT statement where the givenName of the person stored in the XML document is equal to Tom.

SELECT pk, people FROM xmltblnew
WHERE people.exist('/people/person/name/givenName[.="Tom"]') = 1

nodes()

The nodes() method returns a rowset for each row that is selected by the query. You can then work on that rowset using the built-in functionality of SQL Server. The following example returns the XML in relational form by breaking the XML into individual rows per person using CROSS APPLY. You could also use OUTER APPLY if you like. The nodes() method is similar to OPENXML, except that OPENXMLrequires you to prepare the document, and it uses the Document Object Model (DOM) to parse the document, which can slow down performance. The XML datatype is already parsed, so you could see better performance using nodes() than using OPENXML.

SELECT T2.Person.query('.')
FROM xmltblnew
CROSS APPLY people.nodes('/people/person') as T2(Person)
Results:
Row 1: <person><name><givenName>Martin</givenName>
<familyName>Smith</familyName>
</name><age>33</age><height>short</height></person>
Row 2: <person><name><givenName>Stacy</givenName>
<familyName>Eckstein</familyName></name><age>40</age>
<height>short</height></person>
Row 3: <person><name><givenName>Tom</givenName>
<familyName>Rizzo</familyName></name>
<age>30</age><height>medium</height></person>

Cross-Domain Queries

There may be times when you want to combine your relational and XML data. You’ve already seen some examples of cross-domain queries in the previous section, whereby you combine relational queries with XML queries, or vice versa.

SQL Server provides functionality for you to use your relational data in your XQuery through the sql:variable() and sql:column() methods. The sql:variable() method allows you to apply a SQL variable in your XQuery. The sql:column() method allows you to use a SQL column in your XQuery.

The following example uses the sql:column() method to retrieve values from relational columns in the table and sql:variable to retrieve a T-SQL variable value, and it uses both of these to generate a result set:

USE xmldb
GO
CREATE TABLE xmlTable (id int IDENTITY PRIMARY KEY,
CustomerID char(5),
Name varchar(50),
xmlCustomer XML);
GO
INSERT INTO xmlTable
VALUES ('AP', 'Apress LP', 'Berkeley CA', '<Customer />');
GO
DECLARE @numemployees int;
SET @numemployees=500;
SELECT id, xmlCustomer.query('
DECLARE namespace pd="urn:example/customer";
<Customer
CustomerID= "{ sql:column("T.CustomerID") }"
CustomerName= "{ sql:column("T.Name") }"