Origin of Markup Languages - XML

Markup Languages (MLs), such as SGML, HTML, and Extensible Hypertext Markup Language (XHTML), are used to format text. In ancient times, the set of instructions given to printers to print a page in a specified format was called markup, and the collection of such instructions was called a Markup Language. This is how the concept of MLs originated. These instructions were written in a format that was different from the main text so that they were easily differentiated from the main text. This format was later transformed in the form of tags that are currently used with the markup languages.

For example, to convert the main text written in a document to bold, HTML uses the <B> tag at the beginning of the text. Similarly, the </B> tag is used to mark the end of the text that needs to be bold, as shown in the following example:


The following sections discuss SGML, HTML, and XML in detail.


Introduced in 1969, Generalized Markup Language (GML) was the first markup language. GML used tags to format the text in a document. However, different documents created using GML required different compilers to compile them. In addition, standards were not available for compiling the documents written in GML. This led to the evolution of SGML.

SGML, introduced in 1986, is the markup language used to define another markup language. Therefore, SGML is considered a meta language. It is the first international standard used to describe the structure of a document Due to this, the International 16

Organization for Standardization (ISO) recognized SGML as the standard markup language. SGML allows you to define and create documents that are platform independent and can be used to exchange data over a network.

Despite the advantages that SGML offered, developers felt a need for another markup language because the authoring software (software used to create SGML documents) was complex and expensive. In addition, with the increasing popularity of the Internet, it was necessary to create a markup language that could be used to develop Web pages easily and efficiently. As a result, HTML was developed in 1997.


HTML is the markup language that was evolved from SGML. However, unlike SGML, HTML does not require expensive software for creating documents. Instead, you can create an HTML using text editing software, such as Notepad.

HTML is used to describe and format data so that the data can be viewed using a Web page browser. Similar to SGML and any other markup language, an HTML document also contains tags. For example, to provide a title to the document, you use the <TITLE> tag. Similarly, to include headings in your document, you can use heading tags, such as <H1>, <H2>, and so on. Using these and several other tags, you can create a Web page as shown in the following example:

APB Publications Inc. is a group of publishers based in New York. It has been publishing technical books for the past 10 years and is now moving into publishing books on fiction.

It has over 10,000 employees working in different branches of the organization.

After writing the preceding code, you need to save the file with an extension of .htm or .html. You can now open the file in the browser, such as Internet Explorer, to view the output of the code. The output of the preceding code will look like Figure.

A Sample HTML Document

A Sample HTML Document

As you can see, creating documents using HTML is easy. However, HTML does not allow you to create custom tags. This implies that you are limited to using only the tags that are predefined by HTML to define the formatting of the text in the page. To create your own tags with generic names, you need XML, which is another markup language. In addition, HTML does not allow you to present information in different browsers.


Having discussed markup languages, such as SGML and HTML, we will now look at XML and present the advantages offered by XML over these languages.

XML, a subset of SGML, is a text-based markup language used to describe data. However, you cannot format data using XML. To do so, you need to use style sheets. You will learn about style sheets later in this section.

You can create an XML document by using authoring tools that are simple and readily available. Similar to HTML documents, you can create an XML document in a text editor, such as Notepad. To store a text file as an XML document, you need to save it with the .xml extension. You will learn to create an XML document later in this section.

Another significant advantage of XML is its ability to create tags with names that users can identify easily. This implies that XML allows you to create elements, attributes, and containers with meaningful names, as and when the user requires.

Now we will consider another example of an HTML file:

The output of the preceding code is shown in Figure.

Another HTML Document.

Another HTML Document.

As you can see, the tags that HTML uses are difficult to interpret. For example, it would be easier for a user to interpret a tag with the name Employee_Name than a tag with the name <h2>. Therefore, the preceding document can include custom tags by using XML as shown:

You can view the output of the preceding code by saving the Notepad file with a name Employee.xml and opening the file in Internet Explorer. The XML document will appear as shown in Figure.

A Sample XML Document.

A Sample XML Document.

The ability to create custom and meaningful tags in XML makes it a hardware and software independent markup language. This implies that an XML document can be interpreted easily by any computer that is running on any operating system. Therefore, XML is widely used as a markup language that transfers structured data over a network. In addition, XML is used to transfer structured data in high-level Business-to-Business (B2B) transactions.

The following list summarizes the advantages of XML as a markup language:

  • In recent times, XML has been used as a standard markup language for the exchange of data over a network. This is because XML allows you to describe content in a text based format that the two applications can understand easily. Due to its extensibility and universal format, XML is being widely used for data exchange between applications.
  • XML can make searching for information easy on the Internet. At present, the popular search engines on the Internet return huge amounts of data because search engines either search for the entire text in an HTML page or the search terms in the keyword called metadata. However, using metadata to search for text is not an accurate method because a search based on keywords can be misleading. The search engines need to do a full-page search, which is time consuming. In addition, because the HTML tags only describe the format of the page and not the content that is stored in the page, the results returned by searching the HTML tags are not satisfactory.

For example, by your specifying the keywords Linux programming if you need to search for sites on Linux programming, the search engine returns all pages that contain these two words. The search result might also include the Web pages on Windows programming with passing information on Linux. The search engine cannot judge the context in which the words Linux and programming are used in the Web page.

However, consider a situation in which XML is used to create Web pages, which include tags as shown in the following example:


If a search engine parses the document containing the tags and retrieves the data from the <subject> and <category> tags, then the result returned by the search engine is accurate. This helps to omit several thousand Web pages that contain these keywords in a different context. However, this scenario is not plausible because Web will not become XML based in the near future.

  • XML allows you to create custom tags. The tags that you create using XML can be based on the requirements of a document; therefore, they can be used as a vocabulary for all related documents. For example, you can create a vocabulary of tags for describing the details of an employee. After creating the tags, the tags can be used as a template to describe the information about all the employees of the organization.

An example of such a template is shown in the following code:

XML is used to describe data. However, you cannot use XML to format data for display in a Web page. A markup language, such as HTML, is used to describe data, and it includes information about the presentation of data in a Web page. Therefore, to present the same data in different formats by using HTML, you need to create separate Web pages.

To format the data in an XML document, you can use style sheets, such as Extensible Style Sheet Language (XSL) or Cascading Style Sheet (CSS). These style sheets contain information about how to present the data in the document. As a result, using XML, you can present the same data in different formats.

Overview of DTD

XML allows you to create custom tags. However, when you create your own structured document, you need to convey the structure to the users who use the XML document. You can provide this information to the users in the form of Document Type Definitions (DTDs).

A DTD is a vocabulary that defines the structure and elements in an XML document. XML documents that you created in the previous examples were syntactically correct, but they do not conform to vocabulary rules. Such XML documents are called well-formed XML documents. However, an XML document that has a DTD attached to it is called a valid XML document. This implies that a valid XML document is both syntactically correct and conforms to the rules of vocabulary as described in a DTD. The following bulleted list discusses some of the rules that you need to follow while creating a valid XML document:

  • An XML document should start with an XML declaration statement.
  • Every starting tag should have a corresponding ending tag.
  • Empty tags should end with /.
  • Names of elements and attributes are case sensitive.
  • An XML document can have only one root element, which in turn contains all the elements that you need to include in the document.
  • Attribute values should be enclosed in quotes.

Consider the Employee.xml file that you created in the previous example. To make this document, you need to include a DTD with the document. In this case, a DTD would contain information such as the elements in the Employee.xml file and the relationship between these elements. In addition, a DTD contains the vocabulary rules that are used as standards to exchange data in an XML document. However, it is not essential to have a DTD associated with every XML document.

Advantages of Using DTDs

A DTD is used to validate the content in an XML document. When the data in an XML document is exchanged over a network, the receiving application can validate the structure of the XML document based on the rules defined in a DTD. However, to do so, the receiving application requires a parser.

In addition to validating a document after it is created, you can also use DTDs with the authoring tools to ensure that the document you create conforms to the rules defined in a DTD. In other words, when you create an XML document by using an authoring tool that has a DTD associated with it, the authoring tool ensures that you can use only the elements and attributes that are defined in a DTD in your document.

To use a DTD with an XML document, you first need to associate the XML document with a DTD. To do this, the DOCTYPE declaration statement is included in the beginning of the XML document.

The DOCTYPE Declaration Statement

The DOCTYPE declaration statement includes the keyword DOCTYPE. In addition, the DOCTYPE declaration statement might include the markup declaration statement as a subset. The markup declaration statements, which are included as a subset of the DOCTYPE declaration statement, are called internal DTD subset. The syntax of the DOCTYPE declaration statement is as shown:

<!DOCTYPE name [markup statements]>

In the preceding syntax, name is the root element in the XML document. Consider the Employee.xml file that you created. The root element in this case is Employees; therefore, the DOCTYPE declaration statement in this case would be as shown:

<!DOCTYPE Employees [markup statements]>

Similarly, you can include an external DTD in the XML document. To do this, include the source and path of the external DTD in the DOCTYPE declaration statement. The path of a DTD is the URL of the .dtd file.

The DOCTYPE declaration statement also includes a keyword, which can be either SYSTEM or PUBLIC.

  • The SYSTEM keyword denotes that the markup declaration statements are directly included in the .dtd file present at the specified URL.
  • The PUBLIC keyword denotes that the DTD to be included is a well-known vocabulary in the form of a local copy of the .dtd or .dtd file placed in a database server. If you use the PUBLIC keyword, then the application that is associated with the .dtd file needs to locate the file on its own. The syntax for associating an external DTD in the XML document is as shown:
  • <!DOCTYPE name keyword URL>

As discussed earlier, name is the root element in the XML document, and the keyword is either SYSTEM or PUBLIC.

Having looked at the DOCTYPE declaration statement, we will next discuss the other components of an XML document.

Components of an XML Document

An XML document consists of several components, such as declaration statements, elements, tags, and attributes. The following sections discuss these components in detail.

Markup Syntax

Components, such as declaration statements or markup tags, define the syntax for creating an XML document. The syntax used to create an XML document is called markup syntax. The markup syntax is used to define the structure of the data in the document. The markup syntax includes all tags, DOCTYPE declaration statements, comments, DTDs, and character references.

XML Entities

In addition to the markup syntax, an XML document consists of the content or data to be displayed. Consider the following example:

In this case, the tags <Employees>, <Name>, <Age>, <Designation>, and <Department> are the markup syntax for the Employees.xml file. However, the content of the XML file is the data enclosed within tags, such as John Smith, 30, HR Executive, and Human Resources

The data stored in an XML document is in the form of text, and it is commonly called an XML entity or a text entity. The text entity is used to store the text in the form of character values as defined in the Unicode Character Set. The following example shows the text data in an XML document:

<Employee_Name>John Smith</Employee_Name>

XML Declaration Statement

The XML declaration statement is included in the beginning of an XML document. It is used to indicate that the specified document is an XML document. The XML declaration statement includes a keyword, xml, preceded by a question mark (?). This statement includes the XML specification to which the XML document adheres. For example, if the XML document that you create is based on XML Specification 1.0, then the XML declaration statement would be as shown here:

<?xml version="1.0" ?>

In addition to the information about the XML version being used, you might provide information such as whether external markup declaration statements are included in the XML document. To do this, you can use the standalone keyword. Consider the following declaration statement:

<?xml version="1.0" standalone="yes">

The attribute value of yes in the preceding code indicates that no external markup declarations are used in the XML document. You can include external markup declaration statements in the XML document by changing the attribute value to no.


Another important component of an XML document is comment entries. Comments allow you to include instructions or notes in an XML document. These comments help you to provide any metadata about the document to the users of the document. Any data that is not part of the main content or the markup syntax can be included in comment entries.

The XML processor ignores any text that you include in comment entries.
This implies that the XML processor does not execute the text in the comment entries. Therefore, you need to be careful while writing comments.

The syntax for writing a comment in an XML document is the same as that of writing a comment in an HTML document. The syntax is as shown:


The exclamation (!) sign in the preceding code indicates that the text within the tags is a comment entry. Consider the following example:

The use of comments in a document provides users with additional information about the document. However, while writing comments, you need to follow the listed guidelines:

  • You cannot include the comment entries in the beginning of an XML document. The first line in the XML document is essentially the XML declaration statement. Therefore, the following code snippet results in an error:
  • <!--This XML document contains information about the employees of the
  • organization.-->
  • <?xml version="1.0" ?>
  • You cannot include hyphens (--) within the comment text. For instance, the following code statement produces an error:
  • <!--This XML document contains information about the -- employees-- of the
  • organization.-->
  • You cannot include comments within tags. For example, the following code statement produces an error:
  • <Employees <!--This XML document contains information about the employees of the
  • organization.--> >
  • You cannot have nested comment entries. For example, the following code statement produces an error:
  • <!--This XML document contains information about the employees <!--John Smith-->
  • of the organization.-->


The building blocks of any XML document are its elements. Elements are containers that contain XML data, such as text, text references, entities, and so on. You can also include elements within another element. This implies that you can have nested elements. The content within an element is called the element content. It is essential that you enclose all XML data within elements.

While creating an element, you need to include the starting tags and the ending tags, both of which contain the name of the element. As discussed earlier, you can give any user-defined name to an element.

You cannot include white spaces in an element name, but you can begin the name of an element with an underscore (_) symbol or a letter.

Considerthe following example:

In the preceding code, the Employees element has <Employees> as the starting tag and as the ending tag.

It is essential to have corresponding ending tags for all starting tags. Also, remember that the names given to elements are case sensitive. Therefore, <Employees> and <employees> are interpreted as different tags.

Empty Elements

You have seen that all elements include starting and ending tags. However, if you have to create elements with no content, you can create empty elements. Empty elements can be written in an abbreviated form. For example, if the <Employees> element in the previous example contains no text, you can write the element as shown:

<Employees />

However, if you include both starting and ending tags as shown in the following code, an error is not generated:


It is essential for HTML users to understand the concept of empty elements because not all the tags in HTML require ending tags. As a result, the following code in HTML would not produce an error:

<img src="APB.gif">

However, the preceding code will generate an error in XML. To avoid an error, you need to include the ending tags as shown:

<img src="APB.gif"/>

Nested Elements

As discussed earlier, you can include nested elements in an XML document. Consider the following example:

The output of the preceding code is shown in Figure.

Nested Elements.

Nested Elements.


Attributes are used to specify properties of an element. An attribute has a value associated with it. Consider the following statement:

<img src="APB.gif"/>

In the preceding code, the keyword src is an attribute of the element img, and the value assigned to the src attribute is APB.gif. Attributes allow you to provide additional information in an XML document. For example, the src attribute of the img element specifies the name of the image file to be included in the XML document. The value of an attribute is assigned to it by using the equal sign (=), and it is enclosed within double quotes (") as shown.

Similar to an element, you can assign meaningful names to attributes. An attribute name can begin only with a letter or an underscore (_) symbol and cannot include white spaces. However, an attribute value can include white spaces and can even begin with a numeral.

All the attributes that you declare for an element are included in a DTD. A DTD contains the ATTLIST tag that includes the attribute declaration statement for each attribute.

You can include multiple attribute definitions in one ATTLIST tag. However, to avoid confusion, it is advisable that you include a different AATLIST tag for each attribute.

The syntax for the ATTLIST tag is as shown:

<!ATTLIST element_name attribute_name value>

Consider that you need to create an element with the name Books. In this case, you can create an attribute with the name type and assign a value, Technical, to it. To do so, add the following statement:

<Books type="Technical" />

After an attribute is declared and a value is assigned to it, you need to include its declaration in a DTD as shown:

<!ATTLIST Books type CDATA>

In the preceding code, CDATA implies that the value of the attribute is stored in a character string.

We have discussed all important components of an XML document. You can use these components to create a simple XML document as discussed in the following section.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

XML Topics