Integrated NLP Evaluation System for Pluggable Evaluation Metrics with Extensive Interoperable Toolkit

To understand the key characteristics of NLP tools, evaluation and comparison against different tools is important. And as NLP applications tend to consist of multiple semi-independent sub-components, it is not always enough to just evaluate complete systems, a fine grained evaluation of underlying components is also often worthwhile. Standardization of NLP components and resources is not only significant for reusability, but also in that it allows the comparison of individual components in terms of reliability and robustness in a wider range of target domains. But as many evaluation metrics exist in even a single domain, any system seeking to aid inter-domain evaluation needs not just predefined metrics, but must also support pluggable user-defined metrics. Such a system would of course need to be based on an open standard to allow a large number of components to be compared, and would ideally include visualization of the differences between components. We have developed a pluggable evaluation system based on the UIMA framework, which provides visualization useful in error analysis. It is a single integrated system which includes a large ready-to-use, fully interoperable library of NLP tools.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Burr Settles,et al.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[3]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[4]  Seth Kulick,et al.  Parallel Entity and Treebank Annotation , 2005, FCA@ACL.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[7]  Fredrik Olsson,et al.  Protein names and how to find them , 2002, Int. J. Medical Informatics.

[8]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[9]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[10]  David A. Ferrucci,et al.  Building an example application with the Unstructured Information Management Architecture , 2004, IBM Syst. J..

[11]  K Bretonnel Cohen,et al.  Journal of Biomedical Discovery and Collaboration Open Access an Open-source Framework for Large-scale, Flexible Evaluation of Biomedical Text Mining Systems , 2008 .

[12]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[13]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[14]  Sophia Ananiadou,et al.  Filling the Gaps Between Tools and Users: A Tool Comparator, Using Protein-Protein Interactions as an Example , 2007, Pacific Symposium on Biocomputing.

[15]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[16]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[17]  Sophia Ananiadou,et al.  Towards Data and Goal Oriented Analysis: Tool Inter-operability and Combinatorial Comparison , 2008, IJCNLP.

[18]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[19]  Sophia Ananiadou,et al.  Sharable type system design for tool inter-operability and combinatorial comparison , 2008 .