An open question to authors of text mining tools


Dear {creator of text mining tool},

I'm interested in using your software for entity extraction, but I'd like to know two things:

If the input document is XML, will the annotator be able to preserve the structure of the XML so that the resulting annotations can be stored in a stand-off document, relating the positions of the annotations to anchors in the original XML document?

Also, if the input document is XML, is it possible to specify that entities should be recognised across some element boundaries (superscript, subscript, bold, italic, etc), but not others (paragraphs, citation references, etc)? (MarkLogic calls this "phrase-through" and "phrase-around").

If the answer to these is "yes", I'd really like to hear about it.

Yours sincerely,