# XML Expat library - PHP

Although the functions in this section come last, they are among the most important functions available. The extensible markup language, XML, has steadily grown in popularity since being introduced in 1996. XML is a first cousin to HTML in that it, too, is derived from SGML, a generalized markup language that is nearly 20 years old. Like HTML, XML documents surround textual data with tags. Unlike HTML, XML can be used to communicate any type of data.

The functions in this section wrap the Expat library developed by James Clark . This library is part of the PHP distribution, and its purpose is parsing XML documents. These functions work differently from other PHP extensions. A stream of data is fed to the parser. As complete parts of the data are recognized, events are triggered. These parts are the tags and the data they surround. You register the events with a handler, a function you write. You may specify FALSE for the name of any handler, and those events will be ignored.

In order to avoid repeating large blocks of code, I've written one example that uses most of the functions in this section. It's near the description of xml_set_element_handler. You will always need to create a parser. You will also want to create handlers for character data and starting and ending tags. Some of the other handlers may not be of use in most applications. You can leave them out, and that data will be ignored by the parser.

xml_set_element_handler

Stig Bakken added the XML extension to PHP.

string utf8_decode(string data)
The utf8_decode function takes UTF-8 text and returns ISO-8859-1 text.

string utf8_encode(string data)
The utf8_encode function returns the data argument as UTF-8 text.

string xml_error_string(integer error)
The xml_error_string function returns the description for the given error code.

integer xml_get_current_byte_index(integer parser)
The xml_get_current_byte_index function returns the number of bytes parsed so far.

integer xml_get_current_column_number(integer parser)
The xml_get_current_column_number function returns the column number in the source file where the parser last read data. This function is useful for reporting where an error occurred.

integer xml_get_current_line_number(integer parser)
The xml_get_current_line_number function returns the line number in the source file where the parser last read data. This function is useful for reporting where an error occurred.

integer xml_get_error_code(integer parser)
The xml_get_error_code function returns the last error code generated on the given parser. Constants are defined for all the errors. If no error has occurred, XML_ERROR_NONE is returned. If given an invalid parser identifier, FALSE is returned.

boolean xml_parse(int parser, string data, boolean final)
The xml_parse function scans over data and calls handlers you have registered. The size of the data argument is not limited. You could parse an entire file or a few bytes at a time. A typical use involves fetching data within a while loop. The final argument is optional. It tells the parser that the data you are passing is the end of the file.

boolean xml_parse_into_struct(int parser, string data, array structure, array index)
The xml_parse_info_struct function parses an entire document and creates an array to describe it. You must pass the structure argument as a reference. Elements numbered from zero will be added to it. Each element will contain an associative array indexed by tag, type, level, and value. The index argument is optional. You must pass it by reference as well. It will contain elements indexed by distinct tags found in the XML file. The value of each element will be a list of integers. These integers are indices into the structure array. It allows you to index the elements of the structure array that match a given tag.If you set any handlers, they will be called when you use xml_parse_into_struct.

integer xml_parser_create(string encoding)
Calling xml_parser_create is the first step in parsing an XML document. An identifier to be used with most of the other functions is returned. The optional encoding argument allows you to specify the character set used by the parser. The three character sets accepted are ISO-8859-1, US-ASCII, and UTF-8. The default is ISO-8859-1.

boolean xml_parser_free(integer parser)
The xml_parser_free function releases the memory being used by the parser.

xml_parser_get_option(integer parser, integer option)
The xml_parser_get_option function returns an option's current value.

xml_set_object(integer parser, object container)
The xml_set_object function associates an object with a parser. You must pass the parser identifier and a reference to an object. This is best done within the object using the this variable. After using this function, PHP will call methods of the object instead of the functions in the global scope when you name handlers.
<?
class myParser
{
var $parser; function parse($filename)
{
//create parser
if(!($this->parser = xml_parser_create())) { print("Could not create parser!<BR> "); exit(); } //associate parser with this object xml_set_object($this->parser, &$this); //register handlers xml_set_character_data_handler($this-
>parser,
"cdataHandler");
xml_set_element_handler($this->parser, "startHandler", "endHandler"); /* ** Parse file */ if(!($fp = fopen($filename, "r"))) { print("Couldn't open example.xml!<BR> "); xml_parser_free($this->parser);
return;
}
while($line = fread($fp, 1024))
{
xml_parse($this->parser,$line,
feof($fp)); } //destroy parser xml_parser_free($this->parser);
}
function cdataHandler($parser,$data)
{
print($data); } function startHandler($parser, $name,$attributes)
{
switch($name) { case 'EXAMPLE': print("<HR> "); break; case 'TITLE': print("<B>"); break; case 'CODE': print("<PRE>"); break; default: //ignore other tags } } function endHandler($parser, $name) { switch($name)
{
case 'EXAMPLE':
print("<HR> ");
break;
case 'TITLE';
print("</B>");
break;
case 'CODE':
print("</PRE>");
break;
default:
//ignore other tags
}
}
}
$p = new myParser;$p->parse("example.xml");
?>

xml_parser_set_option(integer parser, integer option, value data)
Use xml_parser_set_option to change the value of an option.

boolean xml_set_character_data_handler(integer parser, string function)
Character data is the text that appears between tags, and xml_set_character_data_handler sets the function executes when it is encountered. Character data may span many lines and may cause several events. PHP will not concatenate the data for you. The function specified in the function argument must take two arguments. The first is the parser identifier, an integer. The second is a string containing the character data.

boolean xml_set_default_handler(integer parser, string function)
The xml_set_default_handler function captures any text not handled by the other handlers. This includes the DTD declaration and the XML tag. The function specified in the function argument must take two arguments. The first is the parser identifier, an integer. The second is a string containing the data.

boolean xml_set_element_handler(integer parser, string start, string end)
Use xml_set_element_handler to assign the two functions that handle start tags and end tags. The start argument must name a function you've created that takes three arguments. The first function is the parser identifier. The second is the name of the start tag found. The third is an array of the attributes for the start tag. The indices of this array are the attribute names. The elements are in the same order as they appeared in the XML.The second function handles end tags. It takes two arguments, the first of which is the parser identifier. The other is the name of the tag.

boolean xml_set_external_entity_ref_handler(integer parser, string function)
XML entities follow the form of HTML entities. They start with an ampersand and end with a semicolon. Between these two characters is the name of the entity. An external entity is defined in another file. This takes the form <!ENTITY externalEntity SYSTEM "entities.xml"> in your XML file. Each time the entity appears in the body of the XML file, the handler you specify in xml_set_external_entity_ref_handler is called. The handler function must take five arguments. First is the parser identifier. Next is a string containing the names of the entities open for this parser. Then come the base, the system ID, and the public ID.

boolean xml_set_notation_decl_handler(integer parser, string function)
The handler registered with xml_set_notation_decl_handler receives notation declarations. This are formed like <!NOTATION jpg SYSTEM"/usr/local/bin/jview"> and are meant to suggest a program for handling a data type. The handler must take five arguments, the first of which is the parser identifier. The second is the name of the notation entity. The rest are base, system ID, and public ID, in that order.

boolean xml_set_processing_instruction_handler(integer parser, string function)
The xml_set_processing_instruction_handler function registers the function that handles tags of the following form: <?target data?>. This may be familiar; it's how PHP code is embedded in files. The target keyword identifies the type of data inside the tag. Everything else is data. The function argument must specify a function that takes three arguments. The first is the parser identifier. The second is the target. The third is the data.
boolean xml_set_unparsed_entity_decl_handler(integer parser, string function)
This function specifies a handler for external entities that contain an NDATA element. These take the form of <!ENTITY php-pic SYSTEM "php.jpg" NDATA jpg>, and they specify an external file.