XPath - HTML

When working with XML documents you’ll often work with a process that takes one or more chunks of the XML document and does something with it. The process may be one that transforms the XML into an HTML document so that browsers can view the XML data in a nicely formatted way, or it may be a SQL Server database that extracts bits of an XML document to dump into a query or database table. For these processes to work, you need to be able to get at certain parts of a document. Generally, the way to do that is through the use of XPath.

XPath is often associated with a host language that uses specific aspects of XPath but may expand on the core XPath framework and provide additional functionality. XSLT is a classic example of a host language for XPath. XQuery is another.

XPath, like XML, is case-sensitive, and all XPath keywords use lower case. At the core of XPath is an expression. The result of every expression is a sequence, an ordered collection of zero or more items.

Finding information using XPath
When someone gives you an address for an important meeting and you don’t know where it is, what do you do? Most likely, you employ one of those online mapping services or maybe even use a GPS service from your car. Either way, you end up with a mapping service that shows you the route to your address. This route may be a very short, simple route, or a very complex one, depending on the quality of the mapping service and where your address is in relation to your starting point.

Assume for a moment that someone has given you directions from one part of San Francisco to another. You’re trying to get from the 500 block of Hayes Street to 50 United Nations Plaza by car. To do this, you need to know something about the structure of the city’s street layout. There is some linkage between each step of the route. Here are the basic directions:

  1. Start out going North on OCTAVIA ST toward IVY ST.
  2. Turn RIGHT onto GROVE ST.
  3. Turn RIGHT onto HYDE ST.
  4. Turn LEFT onto MARKET ST.

Notice that you can’t go straight from Octavia to Hyde Street. You have to follow a specific series of steps because all of the streets in the city are connected to each other and have a relationship with one another that you must address to traverse the city.

XPath works the same way. XPath addresses an XML document, allowing you to traverse that document. Luckily, traveling around an XML document is much easier than traveling around San Francisco, because an XML document has a tree structure, whereas San Francisco streets require years of study to understand.

Locations and steps
A typical XPath expression walks along the structure of a document that relies on the development of a link between the node being searched and the root node. Break that link, and it is much more difficult to find out where you are in a document.

Note:The root node in XPath is always the document node. Don’t confuse this with the root element, which is the first element encountered in an XML document.

You can navigate an XML document by "walking" along tree sructure

 You can navigate an XML document by "walking" along tree sructure

Consider the following path. It’s not an XPath, but it looks a lot like one, as you’ll soon discover:

C:Program FilesInternet ExplorerSetup

The preceding snippet is an addressing scheme for a file management system on your hard drive (if you’re on Windows). An XPath works the same way when it traverses an XML document to help you and your cohorts find information. Consider the XML document

Map Directions Mapped to an XML Document

To extract information out of Listing you need to start somewhere. That beginning is referred to as the context node, the originating node from which an XPath expression is evaluated. To find the street element representing the first step you need to get to your destination, you write an XPath that walks the XML document tree, as in the following example:


The [1] in the preceding code fragment indicates the first node within a node set, so directions[1] means the first directions element. When you lead off your expression with the / character, you are indicating the document’s root node. More formally, a path expression consists of a series of one or more steps separated by /, and which can, but are not required to, begin with / or // (you’ll learn about the // characters later).

In other words, I didn’t have to lead off the preceding statement with the / character; I simply chose it to be certain that the XPath processor would begin evaluating the XML document at the root level. If you look at the directions to the destination again, you’ll see that no matter where you are in your route, the starting point never changes. That starting point is like your root node. But as you progress along the route, obviously your position does change. This position along the route is your context node.

From this point, any time along the route, you can change your direction. You can decide to change routes or even your final destination, but you must always begin at your current point along the route. If you leave off the / character, the XPath processor will begin to evaluate the expression fromwherever the processor was in the document at the time the statement was read; in other words, from your context node. So leading off with a / character forces the starting point of your journey to begin at the root node of the document. If you tried to access the node by providing XPath analysis software nothing more than the street element, the software would likely not find it. It would be as if a mapping tool, in giving you the directions I’ve been referring to, simply said, “Go to Hyde Street.”

Note: The root node in XPath is always the document node, and consists of all the nodes of the entire document. Don’t confuse this with the root element, which is the first element encountered in an XML document. A street system in a city is a form of linking system. XPath also provides a consistent linking mechanism via a special notation called location steps.

In their simplest form, location steps are simply the process of getting from one part of the tree to another, step by step, until you reach your destination. In other words, a location step is a progression through anXMLdocument tree that begins at the context node and moves through the hierarchy in a specific direction you define in order to get to your destination. A discussion of some fundamentals behind location steps follows to show you how you can access different nodes.

For simplicity, continuing with the idea of map directions in San Francisco, say you only need the value of the last street in the directions. You already know where you’re going to end up. You can see this by scanning the document. However, the XPath processor doesn’t know this and will need specific instructions on how to navigate to the last street element. There are actually a lot of ways to do that, as you will see as you move your way around XPath.

When traversing documents using location steps, you can use unabbreviated or abbreviated syntax. The unabbreviated syntax relies on something called an axis. An axis uses the context node, which is basically whatever your starting point is when your location step is defined, to move either forwards or backwards from the context node, or, if you prefer, up and down the XML source document tree. Figures shows a more traditional XPath schematic of a document.

Walking up and down a document tree reveals a series of steps you can use to traverse a tree.

Walking up and down a document tree reveals a series of steps you can use to traverse a tree.
A schematic of an XML document.

 A schematic of an XML document.
A typical unabbreviated axis notation looks like this:


The axis is on the left side of the :: characters, and on the right side is a node test. Here’s an example with the XPath you need to drop into the statement in bold:


If you want to access one or more of the nodes indicating a street value, you’ll need to address your document in the same way you provide directions to someone to an address they provide:

  1. The mapdirections node is retrieved when using child::* or its abbreviated syntax, /* or*.
  2. The startingPoint node is retrieved when using child::*/child::* or its abbreviated syntax, /*/*.
  3. The first step node is retrieved when using child::*/child::*/child::* or its abbreviated syntax, /*/*/*.
  4. Each street node is accessed using /child::*/child::*/child::* /street or /*/*/*/street.

Each step progresses along the tree following a very specific pattern until you find your way to the one of the elements you’re looking for.

Using axes for directing traffic
When you’re viewing directions for an address to a city street, you are usually told to turn right or left at certain intersections. When dealing with XML documents, the direction you turn is called an axis, only instead of turning right or left, you move forward, in reverse, or sideways. When you move forward, you refer to a child axis, as you’ve just seen. In the child::* XPath, child is axis, the :: characters are a delimiter, and the * is a node test. The node test might be something else, such as a specific element. When you move in reverse, you refer to a parent axis, which looks like this: parent::*. When you move sideways, you refer to a sibling, like this:

preceding-sibling::* or following-sibling::*. XPath uses a number of axes. Each kind of axis lets you traverse the document going in one direction or another.

Using axes for directing traffic

Using axes for directing traffic

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

HTML Topics