RDF braindump

I don't understand this confrontation between RSS and RDF. They're both forms of XML, designed for different purposes, aren't they? If you're requesting information from a web service you need to receive information that's marked up so you understand what each piece of data means. Bog-standard XML (Google, Amazon, PubMed) makes up its own tags as it goes along and references a central DTD (document describing the tags used), while RSS (weblog updates) and RDF use namespaces to allow tags to be defined using DTDs in lots of different places. For me, the point of RDF is that you can embed it in HTML, so it describes the resource to which it is attached (hence the name), and I don't think you can embed RSS in this way. I can't explain why MusicBrainz uses RDF rather than standard XML to respond to queries, it just makes it too complicated (for my tiny mind). Danny Ayers probably answered this best:

You can't judge the benefits of RDF by judging individual applications in isolation, any more than you could judge the web by looking at a single host. Each application probably could be simpler if it used vanilla XML instead of RDF. But by using RDF they have a level of interoperability that goes far beyond what would be available otherwise. FOAF MusicBrainz data can be inserted directly into RSS 1.0 feeds and their meaning remains globally well-defined - the creator of a resource has the same meaning whether that resource is a document, blog item or piece of music. It means that the same tools, with little or no modification, can be used across all domains. The wheel doesn't have to be reinvented for every purpose.

It's the difference between web services and proper interoperability. I wouldn't want the Amazon web service to return the information about a book in RDF though, would I? I can reformat it to RDF for embedding in a web page (see blam!), or turn it into RSS for publishing a feed of newly released books, or turn it into TouchGraph XML for visualising links. Searching Google for "what is RDF for?" found a few links, most of which - like this one - are complete gibberish. This essay was the clearest answer, and the key points may be "Describes everything by nodes and arcs" and "XML serialization of directed graphs". Can you see why this is so hard for mortals to understand?


Update: It's becoming clearer. The nodes and arcs that RDF uses are equivalent to the nodes and edges of a TouchGraph graph. The bonus of RDF seems to be the use of URIs to indicate references (arcs) to remote resources (nodes), so the one piece of RDF can talk about many separate objects and an aggregator will know what these are. Each node can be described in many different places, but as long as they all use the same format these can be collected together by an aggregator that will understand which node they're all describing.

Comments

http://www.jfsowa.com/pubs/semnet.htm
An in-depth description of semantic networks.

One way to think about this: the Resource Description Framework (RDF) is a family of XML applications who agree to make a certain tradeoff for the sake of cross-compatibility. In exchange for accepting a number of constraints on the way they use XML to write down claims about the world, they gain the ability to have their data freely mixed with that of other RDF applications.

Since many descriptive problems are inter-related, this is an attractive offer, even if the XML syntax is a little daunting. MusicBrainz can focus on describing music, RSS 1.0 on describing news channels, FOAF on describing people, Dublin Core on describing documents, RdfGeo on places and maps, RdfCal on describing events and meetings, Wordnet on classifying things using nouns, ChefMoz on restaurants, and so on.

Yet because they all bought into the RDF framework, any RDF document can draw on any of these 'vocabularies'. So an RSS feed could contain markup describing the people and places and music associated with a concert; a calendar entry could contain information about it's location and expected attendees, a restaurant review could use FOAF to describe the reviewer, or a FOAF file could use Dublin Core to describe the documents written by its author, as well as homepages and other information about those authors.

So, for any particular application, you could do it in standalone XML. RDF is designed for areas where there is a likely pay-off from overlaps and data merging, ie. the messy world we live in where things aren't so easily parceled up into discrete jobs.

But it is a tradeoff. Adopting RDF means that you just can't make up your XML tagging structure at random, but you have to live by the 'encoding' rules expected of all RDF applications.
This is so that software written this year can have some hope of doing useful things with vocabularies invented next year: an unpredictable 'tag soup' of arbitrary mixed XML is hard to process. RDF imposes constraints so that all RDF-flavoured XML is in broadly the same style (for example, ordering of tags is usually insignificant to what those tags tell the world). Those constraints take time to learn and understand and explain, and so adopting RDF isn't without its costs.

And so the more of us who use RDF, the happier the cost/benefit tradeoff gets, since using RDF brings us into a larger and larger family of inter-mixable data.

Does this make any sense?


Discussion based on a post to Dan's RDFWeb blog - http://rdfweb.org/mt/foaflog/archives/000031.html - about how the developers of Spring are having some trouble supporting FOAF helped me realize this: although RDF is commonly encoded as XML, handling RDF isn't as simple as just parsing an XML document. You can have several equivalent RDF expressions rendered in different ways in XML. You can't just use an xpath expression to fish something out of a chunk of RDF

If the RDF mailing list discussion about the issue leads to some kind of RDF canonicalization, it might make it a lot easier for tools to casually add support for RDF, but right now it's pretty tricky. Alternatively, RDF management tools need to become as ubiquitous as XML parsers.

All fields are optional, email address will not be shown; no HTML, URLs are automatically hyperlinked.