René Witte and colleagues (Semantic Software Lab, Concordia University, Montreal) have created an interesting project: Semantic Assistants.
The architecture page describes the system: basically it wraps NLP tools—GATE pipelines at the moment, but can, in theory, be extended to include other frameworks such as UIMA—in a SOAP web service, and makes the pipelines discoverable using descriptions written in OWL.
There's a Java library ("client-side abstraction layer"), which helps desktop clients call the SOAP server to annotate documents. Two example use-cases are also provided: a command-line client and an OpenOffice plugin.
The actual software (SemanticAssistants-beta1.tgz) and well-written documentation (semassist.pdf) are available from the bottom of the Semantic Assistants architecture page.
There are also several publications about the project, which provide some background.
Getting it running on OS X was a bit tricky, as this is an early release and a lot of the paths are hard-coded and pre-requisites missing, but here are some (possibly incomplete) notes:
JAR requirements
(yes, using Maven would have been useful...)
- gate (GATE/bin)
- gate-compiler-jdt (GATE/lib)
- gate-asm (GATE/lib)
- ontotext (GATE/lib)
- jwnl (GATE/lib)
- protege (Protege)
- orphanNodesAlg (Protege/plugins/edu.stanford.smi.protegex.owl)
- protege-owl (Protege/plugins/edu.stanford.smi.protegex.owl)
- owlsyntax (Protege/plugins/edu.stanford.smi.protegex.owl)
- iri (Protege/plugins/edu.stanford.smi.protegex.owl)
- icu4j (Protege/plugins/edu.stanford.smi.protegex.owl)
- woodstox-core-asl (Woodstox)
- stax2-api (Woodstox)
- jdom
- nekohtml
- jena
- slf4j-log4j12 (SLF4J)
- slf4j-api (SLF4J)
- log4j
- xstream
- commons-lang
- xercesImpl (Xerces2)
- PDFBox
Installation steps
- Install Protege (v3) and GATE (v5) in /Applications and install JAX-WS somewhere.
- Create a GATE pipeline: open GATE, load ANNIE (File > Load ANNIE system > With defaults), then right-click the ANNIE application and choose 'Save application state'. Save the file as annie.gapp in Semantic Assistants' Resources/GatePipelines folder.
- Edit build.xml in each of the following directories so that paths point to the right places on your system.
- Running ant in the 'Server' directory should compile the server and start it running.
- Running ant in the 'CSAL' directory should compile the Java library, creating semassist-client.jar in the dist directory. This helps clients communicate with the server.
- Compile the command-line client, then run it using the runclient.sh script. For example:
./runclient.sh invoke "\"Person and Location Extractor\"" "docs=http://en.wikipedia.org/w/index.php?title=Stanley_Kubrick&printable=yes"
should return a list of Person, Location and Date annotations. - With the OpenOffice SDK installed, run ant to compile and install the OpenOffice plugin.