Zemanta API

Zemanta have an API that extracts keywords from text, assigns categories and suggests related content.


Warning: There is a clause in the Zemanta API Terms of Service that states

"Zemanta ltd. will retain a copy of the content, the metadata and content enhancements submitted by you or that generated by the Zemanta service. By submitting content to or generating metadata and content enhancements through the Zemanta service, you grant Zemanta ltd. a non-exclusive perpetual, sublicensable, royalty-free license to that metadata."

While this only covers the metadata, not the submitted content, be careful what you submit.


Here's some PHP code for submitting text to the API (you'll need an API key):

<?php
$params = array(
  'method' => 'zemanta.suggest',
  'api_key' => $api_key,
  'text' => $text,
  'format' => 'json',
  'return_categories' => 'dmoz',
  'return_images' => 0,
  );
$json = file_get_contents('http://api.zemanta.com/services/rest/0.0/', NULL, stream_context_create(array('http' => array('method' => 'POST', 'content' => http_build_query($params))))); // should really use GET, not POST
$data = json_decode($json);

The 'links' section of the results contains Wikipedia, dbpedia and Freebase URIs for each of the entities identified in the text.


Here are the results from zemanta.suggest for the text of a PubMed abstract:

"The Gram-positive bacterium Staphylococcus aureus, similar to other pathogens, binds human complement regulators Factor H and Factor H related protein 1 (FHR-1) from human serum. Here we identify the secreted protein Sbi (Staphylococcus aureus binder of IgG) as a ligand that interacts with Factor H by a-to our knowledge-new type of interaction. Factor H binds to Sbi in combination with C3b or C3d, and forms tripartite SbiratioC3ratioFactor H complexes. Apparently, the type of C3 influences the stability of the complex; surface plasmon resonance studies revealed a higher stability of C3d complexed to Sbi, as compared to C3b or C3. As part of this tripartite complex, Factor H is functionally active and displays complement regulatory activity. Sbi, by recruiting Factor H and C3b, acts as a potent complement inhibitor, and inhibits alternative pathway-mediated lyses of rabbit erythrocytes by human serum and sera of other species. Thus, Sbi is a multifunctional bacterial protein, which binds host complement components Factor H and C3 as well as IgG and beta(2)-glycoprotein I and interferes with innate immune recognition."