A UIMA Database Interface for Managing NLP-related Text Annotations

NLP and automatic text analysis necessarily involve the annotation of natural language texts. The Apache Unstructured Information Management applications (UIMA) framework is used in several projects, tools and resources, and has become a de facto standard in this area. Despite the multiple use of UIMA as a document-based schema, it does not provide native database support. In order to facilitate distributed storage and enable UIMA-based projects to perform targeted queries, we have developed the UIMA Database Interface (UIMA DI). UIMA DI sets up an environment for a generic use of UIMA documents in database systems. In addition, the integration of UIMA DI into rights and resource management tools enables user and group-specific access to UIMA documents and provides data protection. Finally, UIMA documents can be made accessible for third party programs. UIMA DI, which we evaluate in relation to file system-based storage, is available under the GPLv3 license via GitHub.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[3]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[4]  Andreas Niekler,et al.  Leipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data Analysis , 2014, ArXiv.

[5]  Alexander Mehler,et al.  Stolperwege: An App for a Digital Public History of the Holocaust , 2017, HT.

[6]  Scott L. DuVall,et al.  Unlocking echocardiogram measurements for heart disease research through natural language processing , 2017, BMC Cardiovascular Disorders.

[7]  Tolga Uslu,et al.  TextImager: a Distributed UIMA-based System for NLP , 2016, COLING.

[8]  Erhard W. Hinrichs,et al.  Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts , 2012, DH.

[9]  Thilo Götz,et al.  Design and implementation of the UIMA Common Analysis System , 2004, IBM Syst. J..

[10]  Alexander Mehler,et al.  SOA implementation of the eHumanities Desktop , 2012 .

[11]  Walt Detmar Meurers,et al.  Developing a web-based workbook for English supporting the interaction of students and teachers , 2017 .

[12]  Bernhard Mitschang,et al.  The Social Factory: Connecting People, Machines and Data in Manufacturing for Context-Aware Exception Escalation , 2017, HICSS.

[13]  Graham Wilcock,et al.  The Evolution of Text Annotation Frameworks , 2017 .

[14]  Christian Chiarcos,et al.  ANNIS: A Search Tool for Multi-Layer Annotated Corpora , 2009 .

[15]  Iryna Gurevych,et al.  A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[16]  Deborah L. McGuinness,et al.  A proof markup language for Semantic Web services , 2006, Inf. Syst..

[17]  Frank Puppe,et al.  Storing UIMA CASes in a relational database , 2013, UIMA@GSCL.

[18]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.