A Category Theory Approach to Interoperability

In this article, we propose a Category Theory approach to (syntactic) interoperability between linguistic tools. The resulting category consists of textual documents, including any linguistic annotations, NLP tools that analyze texts and add additional linguistic information, and format converters. Format converters are necessary to make the tools both able to read and to produce different output formats, which is the key to interoperability. The idea behind this document is the parallelism between the concepts of composition and associativity in Category Theory with the NLP pipelines. We show how pipelines of linguistic tools can be modeled into the conceptual framework of Category Theory and we successfully apply this method to two real-life examples.

[1]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[2]  Thierry Declerck,et al.  The META-SHARE Metadata Schema for the Description of Language Resources , 2012, LREC.

[3]  James Pustejovsky,et al.  The Language Application Grid , 2014, WLSI.

[4]  James Pustejovsky,et al.  The LAPPS Interchange Format , 2015, WLSI.

[5]  Erhard W. Hinrichs,et al.  WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure , 2010, LREC.

[6]  Edward Hermann Haeusler,et al.  Semantic Interoperability via Category Theory , 2007, ER.

[7]  Claus Zinn,et al.  Squib: The Language Resource Switchboard , 2018, CL.

[8]  Mehrnoosh Sadrzadeh,et al.  Lambek vs. Lambek: Functorial vector space semantics and string diagrams for Lambek calculus , 2013, Ann. Pure Appl. Log..

[9]  Dominic R. Verity,et al.  ∞-Categories for the Working Mathematician , 2018 .

[10]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[11]  Jan Odijk Discovering software resources in CLARIN , 2018 .

[12]  Riccardo Del Gratta,et al.  Cooperative Philology on the Way to Web Services: The Case of the CoPhiWordNet Platform , 2015, WLSI.

[13]  Keith E. Williamson,et al.  Invited Talk: Applying Category Theory to Derive Engineering Software from Encoded Knowledge , 2000, AMAST.

[14]  Anne Preller,et al.  Free compact 2-categories , 2007, Mathematical Structures in Computer Science.

[15]  Carlo Aliprandi,et al.  KAF: a Generic Semantic Annotation Format , 2009 .

[16]  Nancy Ide,et al.  What Does Interoperability Mean , Anyway ? Toward an Operational Definition of Interoperability for Language Technology , 2010 .

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  HAMISH CUNNINGHAM,et al.  Software architecture for language engineering , 2000 .

[19]  John C. Baez,et al.  Physics, Topology, Logic and Computation: A Rosetta Stone , 2009, 0903.0340.

[20]  James Pustejovsky,et al.  A Road Map for Interoperable Language Resource Metadata , 2010, LREC.

[21]  Thomas S. Morton,et al.  Taming Text: How to Find, Organize, and Manipulate It , 2013 .

[22]  Alexander Katovsky,et al.  Category Theory , 2010, Arch. Formal Proofs.

[23]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[24]  Mara Abel,et al.  Ontologies in Category Theory: A Search for Meaningful Morphisms , 2018, ONTOBRAS.

[25]  Menzo Windhouwer,et al.  Standardizing a Component Metadata Infrastructure , 2012, LREC.

[26]  T. Bradley What is Applied Category Theory , 2018, 1809.05923.

[27]  Toru Ishida,et al.  Language Service Management with the Language Grid , 2010, LREC.

[28]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[29]  James Pustejovsky,et al.  The SILT and FlaReNet International Collaboration for Interoperability , 2009, Linguistic Annotation Workshop.

[30]  Claudia Soria,et al.  Lexical Markup Framework (LMF) for NLP Multilingual Resources , 2006 .

[31]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.