REST Web Services, XML and Data Typing

Say you call a REST web service, and it returns some XML:

$data = '<items>
  <item>
    <property>a</property>
    <property>b</property>
  </item>
  <!-- maybe more 'item' elements here -->
</items>';

You can parse the result in PHP using DOMDocument, but it's verbose because there's a lot of selecting (DOMDocument::getElementsByTagName or DOMXPath::query), iterating and casting DOMElements to strings using nodeValue:

$dom = DOMDocument::loadXML($data);
foreach ($dom->getElementsByTagName('item') as $item)
  foreach ($item->getElementsByTagName('property') as $property)
    print $property->nodeValue;

You can parse the result using SimpleXML, which is more straightforward, but you still have to cast all the values to strings (or integers) when you want to use them:


$xml = simplexml_load_string($data);
foreach ($xml->item as $item)
  foreach ($item->property as $property)
    print (string) $property;

One way around this is to roundtrip the data through JSON, which casts nodes to strings, arrays or objects as appropriate:

$data = json_decode(json_encode(simplexml_load_string($data)));
foreach ($data->item as $item) // breaks here
  foreach ($item->property as $property) // Warning: Invalid argument supplied for foreach()
    print $property;

The trouble here is that the XML file didn't contain any information about whether 'item' should be cast to an array of 'item' elements or a single 'item' object: in this case, as there was only one 'item' element, it became an object and so the iterator broke.

Web services that return JSON don't have this problem, as Javascript/JSON has an array type:

$json = '{
  "item": [ // an explicit array of "item" objects
    { "property": ["a", "b"] }
  ]
}';
$data = json_decode($json);
foreach ($data->item as $item)
  foreach ($item->property as $property)
    print $property;

Web services that use SOAP and WSDL also don't have this problem, as the WSDL file that describes the SOAP service includes W3C XML Schema (XSD) files or sections that describe the data types of all the elements in the XML response:

<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:example="http://example.com/ns" targetNamespace="http://example.com/ns">
    <element name="items">
        <complexType>
            <sequence>
                <element minOccurs="0" maxOccurs="unbounded" name="item" type="example:item"/>
            </sequence>
        </complexType>
    </element>
    
    <complexType name="item">
        <sequence>
            <element minOccurs="0" maxOccurs="unbounded" name="property" type="string"/>
        </sequence>
    </element>
</schema>

There's also the SOAP-derived way of declaring arrays in W3C XML Schema, which looks something like this (untested, so might not be correct):

<xsd:complexType xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" name="ArrayOfString">
  <xsd:complexContent>
    <xsd:restriction base="soapenc:array">
      <xsd:attribute ref="soapenc:arrayType" arrayType="xsd:string[]"/>
    </xsd:restriction>
  </xsd:complexContent>
</xsd:complexType>

WSDL 2.0 defines a binding extension for HTTP, which means that you can use WSDL files to describe REST web services (and hence XSD to describe the data types found in the response). There isn't much software available to make use of this in PHP, though - WSF/PHP seems like the main one.

Like W3C XML Schema, RELAX NG also allows data typing using the W3C XML Schema datatypes.

PHP has methods for validating documents using either W3C XML Schema or RELAX NG; what I'd like to see is a method for casting an XML node to primitive datatypes using the definitions found in an XSD file.