Interoperability = f(community, division of labour)

This paper aims to motivate the hypothesis that practical interoperability can be seen as a function of whether and how stakeholder communities duplicate or divide work in a given area or market. We focus on the area of language processing which traditionally produces many diverse tools that are not immediately interoperable. However, there is also a strong desire to combine these tools into processing pipelines and to apply these to a wide range of different corpora. The space opened between generic, inherently "empty" interoperability frameworks that offer no NLP capabilities themselves and dedicated NLP tools gave rise to a new class of NLP-related projects that focus specifically on interoperability: component collections. This new class of projects drives interoperability in a very pragmatic way that could well be more successful than, e.g., past efforts towards standardised formats which ultimately saw little adoption or support by software tools.

[1]  Martin Reynaert,et al.  FoLiA: A practical XML Format for Linguistic Annotation - a descriptive and comparative study , 2014, CLIN 2014.

[2]  Jens Lehmann,et al.  Integrating NLP Using Linked Data , 2013, SEMWEB.

[3]  K. Bretonnel Cohen,et al.  U-Compare: A modular NLP workflow construction and evaluation system , 2011, IBM J. Res. Dev..

[4]  Iryna Gurevych,et al.  A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[5]  Laurent Romary,et al.  International standard for a linguistic annotation framework , 2003, HLT-NAACL 2003.

[6]  Yannick Versley,et al.  BART: A Modular Toolkit for Coreference Resolution , 2008, ACL.

[7]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[8]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[9]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[10]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[11]  Erhard W. Hinrichs,et al.  WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure , 2010, LREC.

[12]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[13]  Nancy Ide,et al.  XCES: An XML-based Encoding Standard for Linguistic Corpora , 2000, LREC.

[14]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[15]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.