What Are XPath and the XMLDOM? - SQL Server 2008

Once you have a set of XML documents, you’ll obviously want to query them in order to retrieve relevant information. XPath is a query language that enables you to define which parts of an XML document you want to select. XPath has a parser that interprets the syntax, reaches into the XML document, and pulls out the relevant parts. For example, you may want to return all customers who live in New York from an XML document. To do this, you would write an XPath statement.

Since XML is hierarchical, you can use XPath to specify the path or paths to the XML that you want to retrieve. Think of XML as a hierarchy of nodes. The root node is normally the XML document entity. Then a tree structure is created under the root node for all your XML. If we took the XML sample shown in previous section and mapped it to the XML path hierarchy, we would get the tree shown in Figure not a very exciting tree, but a tree nonetheless. You can see that all elements, attributes, and text have nodes in the tree. There are seven node types you can access in XPath, including the root, element, attribute, namespace, processing instruction, comment, and text nodes. You’ll find yourself working mostly with element, attribute, processing instruction, and text nodes in your XPath.

An example of an XML document tree

An example of an XML document tree

You can use XPath to navigate among these different nodes using XPath axes. XPath axes describe how to navigate the structure by specifying the starting point and the direction of navigation. There are 13 axes in XPath, but you’ll find that you use the child and attribute axes the most. Table lists the 13 axes.

XPath Axes

XPath Axes

XPath Syntax

XPath uses a set of expressions to select nodes to be processed. The most common expression that you’ll use is the location path expression, which returns back a set of nodes called a node set. XPath can use both an unabbreviated and an abbreviated syntax. The following is the unabbreviated syntax for a location path:

/axisName::nodeTest[predicate]/axisName::nodeTest[predicate]

This begins with the forward slash, which refers to the root node as being the context node. Next is the axis name, followed by a nodeTest and an optional predicate. This all can be followed by one or more similar structures to eventually get to the nodes you are interested in retrieving. So, to retrieve all customers, we would use the following unabbreviated XPath syntax. The default axis, if none is provided, is child.

/child::root/child::customer

Most times, however, you’ll use abbreviated syntax. The abbreviated version of the preceding XPath may be //customer. The double slash is the descendant-or-self axis. You can also use wildcards in XPath. XPath supports three types of wildcards: *, node( ), and @*. * matches any element node regardless of type, except it doesn’t return attribute, text, comment, or processing instruction nodes. If you want all nodes returned, use the node( ) syntax. Finally, the @* matches all attribute nodes. Table shows common XPath expression abbreviations.

.XPath Expression Abbreviations

.XPath Expression Abbreviations
Here are examples of using abbreviated XPath syntax with the customer node:

  • To return all the child elements of our customer node:
/customer/*
  • To return all the attributes only:
/customer/@*
  • To return only customers with a region equal to Northwest:
/customer[@region = "Northwest"]

You can also use compound location paths by combining a multitude of path statements. In XPath, there is special syntax beyond the root node syntax of a single slash (/). For example, you can specify all descendants using a double slash (//). You can also select the parent node using a double period (..). Finally, you can select the current element using a single period (.). The following XPath sample selects all element nodes under the root node that have an ID of 10:

//[@id = "10"]

You may also want to access attributes. To select attributes, you use the @ syntax. For example, to select the id attribute, you would use /customer/name/@id. Sometimes you’ll want to filter element nodes based on the attributes they possess. For example, if we had some customers with no region attribute on their state element, you could filter out those customers by using /customer/state[@name].

XPath Functions

XPath provides functions so that you can return values or manipulate your XML. XPath includes string, node set, number, and Boolean functions. The following are the most common functions you’ll use:

  • position(): This function returns the position you specify in the document order. For example, /customer[position() = 2] would return the customer element in position 2. You can abbreviate this function by leaving out the position() = portion. For example, /customer[2] is equivalent to the previous example.
  • count(): This function returns the number of nodes in the node set. For example, if you wanted to count the number of customers, you would use /count(customer). Or, if customers had orders in the XML document, you could use /customer/orders/count(order).
  • contains(): This function takes two strings and returns true if the second string is contained in the first string. For example, if you wanted to find out whether Tom is contained in a string, you would use /customer/name[contains(.,'Tom')].
  • substring(): This function returns part of the string specified. The first parameter is the string, the second is the start position, and the final parameter is the length, such as /customer/ name[substring(.,1,3)].
  • sum(): This function, as its name suggests, sums numbers together. It takes a node set. For example, to sum all the prices of a set of products, you can use sum(/products/product/price).
  • round(): This function rounds to the nearest integer.

The XMLDOM: XML Document Object Model

The XMLDOM is a programming interface for XML documents. With the XMLDOM, a developer can load, create, modify, and delete XML information. The easiest way to understand the XMLDOM is to see it in action. For SQL Server folks, you can think of the XMLDOM like the dataset in terms of being an in-memory representation of your parsed XML document.

When using the XMLDOM, the first thing you need to do is declare an object of the XML Document type. The XML Document type extends the XML Node object, which represents a node of any type in an XMLdocument. After declaring your XML Document, you need to load or create your XML document. To loadthe XML document, you can use the load or load xml methods, as shown here:

Imports System.Xml.XmlDocument
Dim oXMLDOM As New System.Xml.XmlDocument oXMLDOM.Load("c:myxml.xml")
'Or if you already have it as a string
'oXMLDOM.LoadXml(strXML)

Once you have the document loaded, you can traverse the nodes in the document by using an XMLNode object and the ChildNodes property of the DocumentElement. The DocumentElement property returns the XML document, and the ChildNodes property returns the collection of nodes that makes up the document. The following code scrolls through an XML document and outputs the nodes in the document:

Dim oXMLNode As System.Xml.XmlNode
Dim strResult As String = ""
For Each oXMLNode In oXMLDOM.DocumentElement.ChildNodes
strResult += oXMLNode.Name & ": " & _
oXMLNode.InnerText
Next
MsgBox(strResult)

As part of the XMLDOM, you can also get elements by tag name using the GetElementsbyTagName function. For example, if you had an element called customer, you could retrieve its value using this code:

MsgBox(oXMLDOM.GetElementsByTagName("customer").Item(0).InnerText)

The GetElementsbyTagName returns a node list that you can parse. The code just retrieves the first node in the list, but if you wanted to, you could loop through all the nodes and print them out. The XMLDOM has similar functions, such as GetElementsByID or GetElementsbyName.

To return a node list that corresponds to our earlier XPath statement using the XMLDOM, use the SelectNodes method. This method takes an expression that can be an XSL command or an XPath expression. You can also use the SelectSingleNode method to return just a single node rather than a node list. The code that follows runs a passed-in expression and traverses the returned nodes to print out their value:

Dim oNodes As System.Xml.XmlNodeList = oXMLDOM.SelectNodes(txtXPath.Text)
Dim strReturnString as string = ""
Dim oNode As System.Xml.XmlNode
For Each oNode In oNodes
strReturnString = oNode.OuterXml
Next
Msgbox(strReturnString)

The XPathDocument, XPathNavigator, and XPathExpression Classes

While using the XMLDOM for rapid development is OK, if you want a scalable .NET application that uses XPath, you’ll use the following classes:

  • XPathDocument: This is a high-performance, read-only cache for XML documents with the explicit purpose of parsing and executing XPath queries against that document.
  • XPathNavigator: This is a class based on the XPath data model. This class allows you to query over any data store. You can compile frequently used XPath expressions with the XPathNavigator class.
  • XPathExpression: This class is a compiled XPath expression that you can execute from your XPathNavigator class.

The following code instantiates an XPathDocument object and loads some XML into it. Then the code creates an XPathNavigator using the CreateNavigator method. This method is also supported in the XMLDocument class. To execute our XPath expression, the code calls the Select method and passes in the expression. As you can see, the expression looks for customers with the name Tom Rizzo and then returns the state for customers matching that value.

'Instantiate the XPathDocument class.
Dim oXPathDoc As New System.Xml.XPath.XPathDocument("c:note.xml")
'Instantiate the XPathNavigator class.
Dim oXPathNav As System.Xml.XPath.XPathNavigator =oXPathDoc.CreateNavigator()
'Instantiate the XPathIterator class.
Dim oXPathNodesIt As System.Xml.XPath.XPathNodeIterator = &
oXPathNav.Select("//customer/name[. = 'Tom Rizzo']/parent::node()/state")
'Instantiate a string to hold results.
Dim strResult as string = ""
'Use the XPathIterator class to navigate through the generated result set
'and then display the selected Parent Companies.
Do While oXPathNodesIt.MoveNext
strResult += oXPathNodesIt.Current.Value
Loop
Msgbox(strResult)

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics