XML Schemas to develop rule sets - HTML

DTDs can be somewhat limiting. Consider, for example, the following XML document:

As far as a DTD that might define the rules for the preceding code fragment is concerned, every element contains character data. The value for the integer element is not actually an integer, and the date isn’t a date. This is because DTDs don’t have mathematical, Boolean, or date types of data.

The W3C introduced another rules development methodology called XML Schema to handle richer data typing and more granular sets of rules that allow for much greater specificity than DTDs. In addition to the types of rules DTDs manage, Schema manages the number of child elements that can be used, as well as data types allowed in an element, such as Booleans and integers.

The use of datatyping is especially important because it facilitates working with traditional databases and application program interfaces (APIs) based on Java, C++, and other languages, such as JavaScript.

Working with Schemas
Now that you’re familiar with DTDs, it should be fairly easy to see how their concepts extend to a greater range of datatypes. XML Schema uses XML syntax to develop rule sets, so it is actually more intuitive than the DTD syntax you saw earlier in the chapter.

Recall that an example earlier in the chapter created a simple XML document for contacts that was derived from contact.dtd. Let’s call that XML document contact.xml. If you look at Listing, you can see the same principles at work in a schema. Pay particular attention to the xs:sequence xs:element children (in bold) that live in the xs:complexType element.

A Schema for a Contact XML Document

In a DTD, the sequence of elements that should appear in the contact.xml document was defined by placing commas between elements in an element definition. In XML Schema, a sequence is defined by creating a sequence of elements in a specific order with an xs:sequence element. This is part of the larger definition of the XML document’s root element, which is the contact element. Note the use of the type attribute in the xs:element element, which defines the data type.

Numerous datatypes are available. If you’re familiar with the Java programming language, it might help you to know that most of the datatypes are very similar to Java datatypes. If you’re not familiar with Java, Schema consists of four basic datatypes:

  • numerical (such as integer and double)
  • date
  • string
  • Booleans

The contact element is a complex type of element because it contains other elements. If an element isn’t defined by giving it child elements, it’s a simple type of element. To reference a schema in an XML document, refer to it like this:

The schema is referenced through a namespace. A namespace is represented in an XML document by a namespace declaration, which looks suspiciously like an element attribute but isn’t. This is an important distinction, because when you work with an XML document’s Document Object Model a namespace is part of that model, as is an attribute, so don’t confuse the two.

Only the code highlighted in bold is an attribute/value pair. The other two lines of code are namespaces, which serve as identifiers. They tell a processor that elements associated with them are unique and may have specially developed definitions. The important part of the namespace is the Uniform Resource Identifier (URI), which is what gives a namespace its unique identity. Therefore, when elements live within a specific namespace governed by a schema, they must adhere to the rules of that schema.
The first namespace in the preceding code fragment refers to a namespace established in the schema that uniquely binds the schema to a specified resource, in this case a Web site.

You don’t have to refer to a Web site, and the reference is not actually a physical pointer. Instead, the URI is simply an easy way to establish identity, because a Web site should be unique. It isn’t guaranteed to be unique, of course, because anyone can hijack your Web site address name and use it for their own schema, but it has become fairly standard practice to do so. You could, instead of a Web site name, use a long mash of characters, as in the following example:


The second namespace refers to the W3C’s schema location so that XML processors will validate the XML document against the schema. This is necessary because you then need to call the resource you’re using, in this case, a schema that can be found on the path named in the xsi:SchemaLocation attribute. When the processor finds the schema, it attempts to validate the XML document as the document loads.

If the XML document doesn’t conform to the rules you set forth in the schema definition, an error will result (assuming your parser can work with XML Schema).

XML on the Web
Many companies leverage XML on the Web by using it as part of their middle tier. For example, a database can be used to store and return data to users, but along the way that data may be converted to XML, which, in turn, is transformed using Extensible Stylesheet Language Transformations (XSLT) into HTML renderable on a browser. XSLT has thus become an integral part of any XML deployment on the Web. To render any meaningful HTML from XML, you’ll need to have at least a basic understanding of how XSLT works.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

HTML Topics