RVW standard metadata format for reviews

This is the standard review format as it looks at the moment. I've registered the address with PURL - any comments are welcome, then I'll give the format to RSS harvesters so they can pick up reviews.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:rvw="http://purl.org/net/rvw">
<rdf:Description rdf:about="http://www.amazon.com/exec/obidos/ASIN/0156027321/"
    dc:identifier="ASIN:0156027321"
    dc:type="Text"
    dc:title="Life of Pi"
    dc:creator="Yann Martel"
    dc:publisher="Harvest Books"
    dc:date="2003-05"
    rvw:reviewer="HubLog"
    rvw:reviewerLocation="http://www.pmbrowser.info/hublog"
    rvw:reviewTitle="The review headline goes here"
    rvw:reviewDate="2003-05-08"
    rvw:rating="7"
    rvw:review="The body of the review goes here."
    />
</rdf:RDF>
Updated (2003-05-12) to:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:rvw="http://purl.org/net/rvw">
<rdf:Description rdf:about="http://www.amazon.com/exec/obidos/ASIN/0156027321/"
    dc:identifier="ASIN:0156027321"
    dc:type="Text"
    dc:title="Life of Pi"
    dc:creator="Yann Martel"
    dc:publisher="Harvest Books"
    dc:date="2003-05"
    rvw:type="Book"
    rvw:typeURI="http://www.pmbrowser.info/rvw/types.htm"
    rvw:reviewerName="alf eaton"
    rvw:reviewerMbox_sha1sum="4057e48e7bf04a1bead63f9c6d18b4245a52db03"
    rvw:reviewerSite="HubLog"
    rvw:reviewerURI="http://www.pmbrowser.info/hublog"
    rvw:reviewTitle="The review headline goes here"
    rvw:reviewDate="2003-05-08"
    rvw:reviewBody="The body of the review goes here."
    rvw:reviewURI="http://www.pmbrowser.info/hublog/archives/1234.htm"
    rvw:rating="7"
    />
</rdf:RDF>

Comments

Looks good to me.

So this format is for reviews of *anything*?

dc:identifier in this example uses ASIN:## . Maybe it should just be the ## , and rvw:identifierLocation="http://www.amazon.com/" ? I really don't know what I'm talking about though, so I could be way off.

ASIN is only an identifer for Amazon anyway (Amazon Standard Identification Number), and the rdf:about field already points to the official description of the actual item, so I think having a separate identifierLocation would be extraneous.

You could use ISBN:, ISSN:, MBID: (MusicBrainz), as needed, and point the 'about' URL to the appropriate datastore.

A description of the Dublin Core Identifier element is here: http://dublincore.org/documents/dces/

Posted by: alf on May 8, 2003 1:29 AM

Cool dude - I'll get this out for wide range perusal. More details - server specs - arch drawings?

Instead of Text - I'd call this a Book.

I originally had Book, Music and DVD, as Amazon do, but the Dublin Core specification for the Type element that covers those items uses Text, Sound and Image - http://dublincore.org/documents/dcmi-type-vocabulary/

Posted by: alf on May 8, 2003 2:00 AM

Yes, but rdf:about points to something describing that *item*, not the ID set. That is, it describes book ##, but it doesn't describe what ASIN is.

Would it make sense to use the ASIN as the ID, and then point to a non-Amazon page that describes the item being reviewed? If not, then perhaps it is sort of redundant. But if so, then IMHO there should be a pointer to something that describes the ID set.

How about Text/Book? Or are slashes not permitted? We don't want to get too hung up on vocabulary. I assumed that the type could be Text, Graphic, Audio, or something like that; i.e. less specific than "book".

ah, that's what happens when you take too long to write a comment. other comments get written in the meantime...

Alf, this looks great! Let us know how to spread the word.

Great stuff!

My only suggestion would be to perhaps rename the

rvw:review

property to something like

rvw:reviewText

so you could create a standalone

rvw:Review class

unless there's a better word for it???

this wouldn't be immediately useful in RSS, but I think it could be very useful in other RDF apps.

I used a review vocabulary as an example here :
http://ideagraph.net/xmlns/ssr/modules.htm
(definitely not intended as a proposed standard) which includes a Review class.


Looks good, basically. Two remarks from me:

(1) Re. the rating property. I think this is an issue which -- when standardized early -- can have significant positive ramifications in the future.

(1a) Would it make sense to standardize this on a common scale? This would make possible machine-comparability of different reviews, as well as aggregation.

(1b) If so, I'd suggest a five-point scale as most of the major review systems (e.g. Amazon, BizRate, Epinions with CNET being an exception) use.

(2) A creator email property would be helpful to include those reviews that are not done on someone's own homepage, but on a third-party site.


[ I'll be offline for the next 14 hours, so please excuse if I can't answer should there be any discussion. ]

Including the text of the review seems like a bad idea; the Webbish approach would be to have reviews be resources with URLs of their own. Then the reviewTitle, reviewDate, and reviewer properties are no longer needed, because you can just use dc:{title,date,author}.

Also, what text format can reviews be written in? What if I want to include hyperlinks, images, or tables in a review?

I use a review schema of my own that ends up only needing a single rev:subject property; see my link for details.

AMK: I like your RDF schema as far as it goes for describing the review itself, but it wouldn't fulfil the purpose in this case, which is to make reviews available to aggregators. If the aggregator has to go back to the original page and try and work out which bit is the review and which bit is the rating, etc, it just wouldn't make sense.

AMK & Danny: It might be a good idea to have a separate 'review' class, that contains the information relating to the review itself. It just makes the RDF much more complex and harder to embed in HTML, so I'd like to avoid it if possible. I'll rename the main field to reviewBody though (or reviewText if it is just text).

Stefan: Have to standardise the ratings obviously. I dislike 5-point ratings - my brain works in decimal, and I think 7 represents 'good' much better than 3 or 4. As you say, the other systems all have scales of 5 (to save on downloading stars?), which is unfortunate.

Stefan: The creator email could come in a reviewerID field, which can be any optional form of identification - email or otherwise.

Posted by: alf on May 8, 2003 1:50 PM

Alf - the reviewBody/reviewText sounds fine to me, whatever you reckon. I do think it will be useful to reserve a keyword like 'Review' (not necessarily review, though it is the obvious choice) to use as a class. Ok, it won't be needed in the RSS application, but it could be very useful elsewhere - e.g. review archiving/cataloguing.

AMK - your schema is very elegant, but I think Alf's approach has a simplicity that seems to be pretty critical for adoption in the RSS world.

PS. Alf - once you have this in a form which you're happy with, I'll write up a SSR mapping so it can be used with (the dreaded) RSS 2.0.

Alf:

Thanks for the info on the email issue.

Re. standardised rating: The reason why most of the consumer sites use a five-star scale is twofold, but relates to the same issue, i.e. getting "Average Joe" user to also use the system properly, not only techies or people generally used to arguing rationally / technically: First, you can put words to each of the ratings on a five-star scale ("very good", "good", "average", "bad", "very bad") which is very cumbersome with a ten-star rating ("supergood", "very good", "a little better than just good", "good" ...). Second, most people don't have as specific an opinion on a product that they would be able to differentiate between each rating on a ten-star scale. Thus, your positive bias (which always exists in review rating for reasons of self-selection, i.e. I tend to buy only products I am likely to like) is increased even further. Wihout explanation, people tend to go 10 = very good, 9 = good etc.

I've just updated the format to reflect most of the comments received so far - thanks for your enthusiasm.

I tried to change the rating to 3 out of 5, and couldn't do it - a sudden queasy feeling of dumbing down. NME, Pitchforkmedia, The Guardian all do their reviews on a 1-10 scale (some even with decimal points) - it just seems right.

Posted by: alf on May 12, 2003 11:42 PM

I like the changes. Also, I agree with the 10-point scale. n/10 can be converted to n/5 by applications that use the data, if they really want that.

Any thoughts on extending this to things other than books and music (e.g. laptops, cars, software, restaurants, people). I suppose you could use dublin core "Service" and "PhysicalObject" type? A more difficult thing would be to figure out what could serve as identifiers for those things. There are lots of this could be done, so the question is how do you agree on one. Or is there a standard that I am not aware of?

- yuri

This is the latest format for the reviews, in RSS 2.0:
http://www.pmbrowser.info/hublog/archives/000321.html

Although a unique identifier for each subject would be ideal, it is optional, so you can review whatever you like while providing as much detail as possible in fields such as title/creator/location/manufacturer etc. If standard ways of describing these attributes exist, they can be brought in using namespaces, as the Dublin Core attributes were here. Even without a unique identifier, which would be necessary for aggregating ratings, the review metadata will still be searchable.

Posted by: alf on May 26, 2003 8:45 PM

No need to standardize the ratings -- you can normalize them as long as you have enough data for each reviewer.

All fields are optional, email address will not be shown; no HTML, URLs are automatically hyperlinked.