What Is the DOM? - Java Script

Before we discuss exactly what the DOM is,we should know what led to its creation.Although the DOM was heavily influenced by the rise of Dynamic HTML in browsers,the W3C took a step backward and first applied it to XML.

Introduction to XML

The eXtensible Markup Language (XML) was derived from an earlier language called Standard Generalized Markup Language(SGML). SGML’s main purpose was to define the syntax of markup languages to represent data using tags.

Tags consist of text enclosed between a less-than symbol(<) and a greater-than symbol(>), as in <tag>.Start tags begin a particular area,such as <start>;end tags define the end of an area.They look the same as start tags but have a forward slash (/) immediately following the less-than symbol,as in </end>.SGML also defines attributes for tags, which are values assigned inside of the less-than and greater-than symbols, such as the src attribute in <img src=”picture.jpg”>.If this looks familiar,it should; the most famous implementation of an SGML-based language is the original HTML.

SGML was used to define the Document Type Definition(DTD) for HTML, and it is still used to write DTDs for XML.The problem with SGML is its allowances for odd syntax, which makes creating parsers for HTML a difficult problem:

  • Some start tags specifically disallow end tags, such as the HTML <img>. Including an end tag causes an error.
  • Some start tags have optional or implied end tags, such as the HTML <p>,which assumes a closing tag when it meets another <p> or several other tags.
  • Some start tags require end tags,such as the HTML <script>.
  • Tags can be embedded in any order.For instance, <b>This is a <i> sample </b> string</i>is okay even though the end tags don’t occur in reverse order of the start tags.
  • Some attributes require values, such as src in <img src=”picture.jpg” >.
  • Some attributes don’t require values, such as nowrap in <td nowrap>.
  • Attribute can be defined with or without quotation marks surrounding them,so <img src=”picture.jpg”> and <img src=picture.jpg> are both allowed.

All these issues make creating SGML language parsers a truly arduous task.The difficultly of knowing when to apply the rules caused a stagnation in the definition of SGML languages.This is where XML begins to fit in.XML does away with all the optional syntax of SGML that caused so many developers heartache early on.In XML, the following rules apply:

  • Every start tag must have end tag.
  • An optional shorthand syntax represents both the start and end tags in one. This syntax uses a forward slash (/) immediately before the greater-than symbol,such as <tag />. An XML parser interprets this as being equal to <tag></tag>.
  • Tags must be embedded in an appropriate order,so end tags must mirror start tags, such as <b>this is a <i> sample </i> string </b>.It helps to think of start and end tags as similar to open and close parentheses in math: You cannot close the outermost parenthesis without first closing all the inner ones.
  • All attributes require values.
  • All attributes must use quotes around the values.

These rules make an XML parser much simpler to develop and also remove the guesswork of when and where to apply odd syntax rules.Where SGML failed to gain mainstream acceptance,XML has made tremendous inroads because of its simplicity. XML has spawned several languages in just the first six years of its existence, including MathML, SVG, RDF, RSS, SOAP, XSLT, XSL-FO, and the reformulation of HTML into XHTML.

Today XML is one of the fastest-growing technologies in the world.Its main purpose is to represent data in a structured way using plain text.In some ways, XML files are not unlike databases,which also represent a structured view of data.Here is an example XML file:

Professional JavaScript for Web Developers brings you up to speed on the latest innovations in the world of JavaScript.This book provides you with the details of JavaScript implementations in Web browsers and introduces the new capabilities relating to recently-developed technologies such as XML and Web Services.

Beginning XML, 3rd Edition, like the first two editions, begins with a broad overview of the technology and then focuses on specific facets of the various specifications for the reader. This book teaches you all you need to know about XML:what it is, how it works, what technologies surround it,and how it can best be used in a variety of situations, from simple data transfer to using XML in your Web pages.It builds on the strengths of the first and second editions, and provides new material to reflect the changes in the XML landscape - notably RSS and SVG.

If you’re a Java programmer working with XML, you probably already use some of the tools developed by the Apache Software Foundation.This book is a code-intensive guide to the Apache XML tools that are most relevant for Java developers,including Xerces, Xalan, FOP, Cocoon, Axis,and Xindice.

Every XML document begins with the XML prolog, which is the first line in the previous code, <?xml version=”1.0”?>.This line alone tells parsers and browsers that this file should be parsed based on the XML rules discussed earlier. The second line, <books>,is the document element,which is the outermost start tag in the file (an element is considered the contents of a start tag and end tag).All other tags must be contained within this one in order to constitute a valid XML file. The second line of the XML file need not always contain the document element; it can come later if comments or other (???)

The third line in this sample file is a comment, which you may recognize as the same style comment used in HTML. This is one of the syntax elements XML inherited from SGML.

Alittle bit farther down the page you find a <desc> tag with some special syntax inside it.The <![CDATA[ ]]> code is used to indicate text that should not be parsed,allowing special characters such as less-than and greater-than to be included without fear of breaking the XML syntax. The text must appear between <![CDATA[ and ]]> to be properly shielded from parsing. This is called a Character Data Section or CData Section for short.

The following line is just before the second book definition:

<?page render multiple authors ?>

Even though this looks like the XML prolog, it is actually considered a different type of syntax called a processing instruction. The purpose of processing instructions (or PIs for short) is to provide extra information to programs that are processing the page, such as XML parsers. PIs are generally free form. Their only requirement is that a letter must follow the first question mark. After that point, a PI can contain any sequence of characters aside from the less-than or greater-than symbols.

The most common PI is used to specify a style sheet for an XML file:

<?xml-stylesheet type=”text/css”” href=”MyStyles.css” ?>

This PI is typically placed immediately after the XML prolog and is used by Web browsers to display the XML data using particular styles.

An API for XML

After XML was defined as a language, the need arose for a way to both represent and manipulate XML code using common programming languages such as Java.

First came the Simple API for XML (SAX) project for Java. SAX provides an event-based API to parse XML. Essentially, SAX parsers start out at the beginning of the file and parse their way through the code in one straight pass, firing events every time it encounters a start tag, end tag, attribute, text, or other XML syntax. It is up to the developer, then, to determine what to do when each of these events occurs.

SAX parsers are lightweight and fast because they just parse the text and continue on their way. Their main downside is the inability to stop, go backward, or access a specific part of the XML structure without starting from the beginning of the file.

The Document Object Model (DOM) is a tree-based API for XML.Its main focus isn’t just to parse XML code, but rather to represent that code using a series of interlinked objects that can be modified and accessed directly without reparsing the code.

Using the DOM, code is parsed once to create a tree model; sometimes a SAX parser is used to accomplish this. After that initial parse,the XML is fully represented in a DOM model, and the original code is no longer needed. Although the DOM is slower than SAX and requires more overhead because it creates so many objects,it is the method favored by Web browsers and JavaScript for its ease of use.

Hierarchy of nodes

So what exactly is a tree-based API? When talking about DOM trees (which are called documents),you are really talking about a hierarchy of nodes. The DOM defines the Node interface as well as a large number of node types to represent the multiple aspects of XML code:

  • Document— The very top-level node to which all other nodes are attached
  • DocumentType— The object representation of a DTD reference using the syntax <!DOCTYPE >, such as <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN”>. It cannot contain child nodes.
  • DocumentFragment— Can be used like a Document to hold other nodes
  • Element— Represents the contents of a start tag and end tag, such as <tag></tag> or <tag/>. This node type is the only one that can contain attributes as well as child nodes.
  • Attr— Represents an attribute name-value pair. This node type cannot have child nodes.
  • Text— Represents plain text in an XML document contained within start and end tags or inside of a CData Section.This node type cannot have child nodes.
  • CDataSection— The object representation of <![CDATA[ ]]>. This node type can have only text nodes as child nodes.
  • Entity— Represents an entity definition in a DTD, such as <!ENTITY foo “foo”>. This node type cannot have child nodes.
  • EntityReference— Represents an entity reference, such as &quot;. This node type cannot have child nodes.
  • ProcessingInstruction— Represents a PI. This node type cannot have child nodes.
  • Comment— Represents an XML comment. This node type cannot have child nodes.
  • Notation— Represents notation defined in a DTD. This is rarely used .

A document is made up of a hierarchy of any number of these nodes. Consider the following XML code:

This code can be represented in a DOM document as displayed each rectangle represents a node in the DOM document tree, with the bold text indicating the node type and the nonbold text indicating the content of that node.

Hierarchy of nodes

Both the comment and <employee/> nodes are considered to be child nodes of <employees/> because they fall immediately underneath it in the tree.Likewise, <employees/> is considered the parent node of the comment and <employee/> nodes.

Similarly,<name/>, <position/>,and <comments/> are all considered child nodes of <employee/> and are also considered siblings of each other because they exist at the same level of the DOM tree and have the same parent node.

The <employees/> node is considered the ancestor of all nodes in this section of the tree, including its children (the comment and <employee/>) as well as their children (<name/>, <position/>, and so on, all the way down to the text node “His birthday is on 8/14/68”). The document node is considered the ancestor of all nodes in the document. The Node interface defines 12 constants that map to the different node types

  • Node.ELEMENT_NODE (1)
  • Node.ATTRIBUTE_NODE (2)
  • Node.TEXT_NODE (3)
  • Node.CDATA_SECTION_NODE (4)
  • Node.ENTITY_REFERENCE_NODE (5)
  • Node.ENTITY_NODE (6)
  • Node.PROCESSING_INSTRUCTION_NODE (7)
  • Node.COMMENT_NODE (8)
  • Node.DOCUMENT_NODE (9)
  • Node.DOCUMENT_TYPE_NODE (10)
  • Node.DOCUMENT_FRAGMENT_NODE (11)
  • Node.NOTATION_NODE (12)

The Node interface also defines a set of properties and methods that all node types contain. These properties and methods are listed out in the following table:

properties and methods are listed outproperties and methods are listed out

In addition to nodes,the DOM also defines some helper objects, which are used to work with nodes but are not necessarily part of a DOM document:

  • NodeList — an array of nodes indexed numerically; used to represent child nodes of an element
  • NamedNodeMap — an array of nodes indexed both numerically and name; used to represent element attributes

These helper objects provide additional access and traversal methods for dealing with DOM document.

Language-Specific DOMs

Any XML-based language,such as XHTML and SVG, can make use of the core DOM just introduced because they are technically XML.However,many languages go on to define their own DOMs that extend the XML core to provide language-specific features.

Along with developing the XML DOM,the W3C concurrently developed a DOM more specific to XHTML (and HTML). This DOM defines an HTMLDocument and HTMLElement as the basis for the implementation.Each HTML element is represented by its own HTMLElement type,such as HTMLDivElement representing <div>,with the exception of a small subset of elements that don’t require special properties or methods other than those provided by HTML Element.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Java Script Topics