Input
When an XML document is loaded using DOMDocument::load/DOMDocument::loadXML, there are several libxml options that affect how the document is processed. Here are some of the most useful:
Option | Description |
---|---|
LIBXML_DTDLOAD | Load the DTD for this XML file, as specified in the DOCTYPE declaration and possibly located via /etc/xml/catalog |
LIBXML_NOENT | Replace named character entities with their appropriate characters that are defined in the DTD |
LIBXML_NOCDATA | Convert CDATA blocks into text nodes |
LIBXML_DTDATTR | Add default attributes specified in the DTD if they're missing from XML elements |
LIBXML_DTDVALID | Validate the XML document against the DTD |
There are also related PHP DOMDocument properties that can be set, but it's best to use the libxml options above to have exact control over what happens:
Property | Equivalent |
---|---|
$dom->resolveExternals | LIBXML_DTDLOAD | LIBXML_DTDATTR |
$dom->substituteEntities | LIBXML_NOENT |
$dom->validateOnParse | LIBXML_DTDLOAD | LIBXML_DTDVALID |
$dom->preserveWhiteSpace | None (keep redundant white space) |
The function DOMDocument::validate() can be used instead of setting LIBXML_DTDVALID or $dom->validateOnParse, to validate the document after it has been parsed.
Output
There's one libxml option that can be used with DOMDocument::save/DOMDocument::saveXML to affect the output XML:
Option | Description |
---|---|
LIBXML_NOEMPTYTAG | Expand self-closing empty tags |
There is also one DOMDocument property that can be set before using DOMDocument::save/DOMDocument::saveXML to output XML:
Property | Description |
---|---|
$dom->formatOutput | Indent and format the output |
Note that — if the document contains white space between elements — formatOutput has no effect on the output unless preserveWhiteSpace is set to FALSE before loading the DOMDocument.
Example code
First, a DTD file, saved as example.dtd:
<!-- define the entity "omegachar" -->
<!ENTITY omegachar "Ω">
<!-- set a default "title" attribute for "div" elements -->
<!ATTLIST div
title CDATA "default title">
<!-- define allowable elements and their contents -->
<!ELEMENT body (div+)>
<!ELEMENT div (p*, br*)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT br (#PCDATA)>
Then some example XML that references the DTD:
$xml = '<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE body SYSTEM "example.dtd">
<body><div><p>Ω</p>
<p>&omegachar;</p><p><![CDATA[<test>]]></p><br/></div></body>';
And some code to create and output DOM documents:
<?php
// define the libxml options
$options = LIBXML_DTDLOAD | LIBXML_NOENT | LIBXML_DTDVALID | LIBXML_NOCDATA;
$dom = new DOMDocument();
$dom->loadXML($xml, $options); // load using libxml options
print 'No default attributes and unformatted output; named entity converted:' . "\n";
print $dom->saveXML($dom) . "\n";
$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($xml, $options); // load using libxml options
$dom->formatOutput = TRUE;
print 'Formatted output and no empty tags:' . "\n";
print $dom->saveXML($dom, LIBXML_NOEMPTYTAG) . "\n";
// load with DOMDocument properties instead of libxml options
$dom = new DOMDocument();
$dom->resolveExternals = TRUE;
$dom->substituteEntities = TRUE;
$dom->loadXML($xml);
print 'Default attributes added due to resolveExternals; CDATA nodes unchanged:' . "\n";
print $dom->saveXML($dom) . "\n";
which produces this:
No default attributes and unformatted output; named entity converted:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE body SYSTEM "example.dtd">
<body><div><p>Ω</p>
<p>Ω</p><p><test></p><br/></div></body>
Formatted output and no empty tags:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE body SYSTEM "example.dtd">
<body>
<div>
<p>Ω</p>
<p>Ω</p>
<p><test></p>
<br></br>
</div>
</body>
Default attributes added due to resolveExternals; CDATA nodes unchanged:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE body SYSTEM "example.dtd">
<body><div title="default title"><p>Ω</p>
<p>Ω</p><p><![CDATA[<test>]]></p><br/></div></body>