Open Graph wins the Semantic Web

It took me a year - and the configuration step below - to realise that Open Graph has found a solution that works for referencing things on the web:

We now have a standard way of providing metadata about any object, based on two principles:

Every object is represented by at least one HTML page on the web.
Properties of that object are represented as <meta> elements in the <head> section of that HTML page.

From that, we can make statements about any object using URIs, and fetch metadata about that object using HTTP. The Semantic Web!

Statements

This is an RDF statement:

[THING]	[LINK]	[THING]
<http://music.com/band/nirvana>	<http://example.com/member>	<http://music.com/person/kurt-cobain>.

That's two things connected by a link, all represented by URIs.

Creating a Graph

Several of this kind of statement can be combined to make a graph:

<http://music.com/band/nirvana> <http://example.com/member> <http://music.com/person/kurt-cobain>.

<http://music.com/band/nirvana> <http://example.com/member> <http://music.com/person/dave-grohl>.

<http://music.com/band/nirvana> <http://example.com/member> <http://music.com/person/krist-novoselic>.

<http://music.com/band/nirvana> <http://example.com/recorded> <http://music.com/track/on-a-plain>.

Or, to write those statements in shorthand, without repeating the first part of each one:

<http://music.com/band/nirvana>

<http://example.com/member> <http://music.com/person/kurt-cobain>;

<http://example.com/member> <http://music.com/person/dave-grohl>;

<http://example.com/member> <http://music.com/person/krist-novoselic>;

<http://example.com/recorded> <http://music.com/track/on-a-plain>.

And using prefixes to avoid having to write out the full URI each time:

PREFIX eg: <http://example.com/>

<http://music.com/band/nirvana>

<eg:member> <http://music.com/person/kurt-cobain>;

<eg:member> <http://music.com/person/dave-grohl>;

<eg:member> <http://music.com/person/krist-novoselic>;

<eg:recorded> <http://music.com/track/on-a-plain>.

Fetching information

The URI <http://music.com/band/nirvana> represents the band Nirvana. We could equally have used <http://en.wikipedia.org/wiki/Nirvana_(band)> or <http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh>*. As these are HTTP URLs, a representation of this Thing can be fetched using HTTP - in this case, your web browser probably receives an HTML representation of the band.

How does the server decide in which format to return that information? There's a negotiation between whoever requests the information and the server that provides the information. The request contains a list of formats that it would be able to handle, and the server returns the first of those that it's able to provide. In fact, the information about a Thing might be available as JSON, or XML, or any other format, but Open Graph requires that every Thing identified by a URL must have an HTML web page that represents it.

In this way, we can make statements about any Thing, and fetch information about that Thing by dereferencing its URL to see what information it provides.

How should the information about the Thing be presented in that HTML page**? As the page represents the Thing***, this information can be added to the <head> section of the page; it ends up looking like this:

Which is exactly the same information as in the shorthand RDF statements above. It's RDF in HTML!

If someone says they like the album "Nevermind", a statement is created:
<http://facebook.com/eaton.alf> <http://example.com/emotions/likes> <http://open.spotify.com/album/6okv1avxEgYSdc2JYy6ZEi>

When we fetch the HTML document from the URL referenced (<http://open.spotify.com/album/6okv1avxEgYSdc2JYy6ZEi>), it contains (amongst other things) this information:

<http://open.spotify.com/album/6okv1avxEgYSdc2JYy6ZEi>

<og:type> "music.album";

<og:title> "Nevermind";

<music:release_date> "1991-01-01";

<music:musician> <http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh>.

And when we fetch the HTML document from the "musician" URL <http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh>, it contains (amongst other things) this information:

<http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh>

<og:title> "Nirvana".

When all that information is combined, we know that this person, who clicked the "Like" button while listening to an album in Spotify, liked the album "Nevermind" by the musician "Nirvana" - which is what gets displayed in their Facebook timeline.

Referring to URLs

This is all relevant to my recent post about citing with URIs. In that demonstration, the script dereferenced the URI to get information about the thing, but specifically asked for JSON. In the end, though, the JSON is basically just a list of properties about the thing being referenced, and there's no reason why that information can't be represented in <meta> elements in the <head> of an HTML page, which is exactly what most publishers do in order to get their documents indexed by Google Scholar. They use several prefixes ("dc.", "prism.", "citation_"); they often use meta[name][content] instead of meta[property][content], but it's all basically the same thing. I've now updated the script to parse <meta> elements from HTML, alongside JSON responses.

In summary: if someone wants to refer to a Thing, they should be able to use a HTTP URL. If someone wants to get information about that Thing, they should be able to dereference that URL, get an HTML document, look in the <meta> elements in the <head> section, and retrieve all the information about that thing (including further URLs to find out more information about any of those properties).

* Asserting equivalence between URIs allows links from one URI to also apply to the other. For example:
<http://example.com/music/nirvana> <http://www.w3.org/2002/07/owl#sameAs> <http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh>

** We don't have to worry about representing multiple items on a single page - each one will have a link to its own, individual page.

*** We don't have to worry about whether the URI represents the Thing or a document about the Thing: it's always the Thing. Most of the time, no-one cares who wrote the document about the Thing, or when that document was last updated. An exception might be Wikipedia, so I have a suggestion: the Thing is still represented by the web page; information about authors and update times can be attached to an appropriate property of the Thing, e.g. <http://en.wikipedia.org/wiki/Nirvana_(band)#description>.