UIMA stands for Unstructured Information Management Architecture.
UIMA allows a processing pipeline to pass a text document, for example, to multiple UIMA-compliant services and receive back a collection of annotations for that document. The annotations may include parts-of-speech, named entities, etc.
The University of Tokyo and National Centre for Text Mining (NaCTeM, UK) have collaborated to produce U-Compare, a cluster-hosted repository of UIMA components, due to launch at the end of this month.
UIMA uses SOAP for interaction between services.
From the draft specification:
In UIMA the original content is not affected in the analysis process. Rather, an object graph is produced that stands off from and annotates the content. Stand-off annotations in UIMA allow for multiple content interpretations of graph complexity to be produced, co-exist, overlap and be retracted without affecting the original content representation.
CAS (Common Analysis Structure) objects are used to represent documents and annotations; these can be passed between components as a standardised XML serialisation called XMI (XML Metadata Interchange; used to be XCAS).