Towards the Automatic Merging of Lexical Resources: Automatic Mapping

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to have richer resources with a broad range of potential uses for a significant number of languages. With the objective of reducing cost by eliminating human intervention, we present a new method for automating the merging of resources, with special emphasis in what we call the mapping step. This mapping step, which converts the resources into a common format that allows latter the merging, is usually performed with huge manual effort and thus makes the whole process very costly. Thus, we propose a method to perform this mapping fully automatically. To test our method, we have addressed the merging of two verb subcategorization frame lexica for Spanish, The results achieved, that almost replicate human work, demonstrate the feasibility of the approach.

[1]  Tracy Holloway King,et al.  Unifying Lexical Resources , 2005 .

[2]  Eric Atwell,et al.  Automatic Extraction of Tagset Mappings from Parallel-Annotated Corpora , 1995, ArXiv.

[3]  Ann Copestake,et al.  Implementing typed feature structure grammars , 2001, CSLI lecture notes series.

[4]  Simone Teufel A Support Tool for Tagset Mapping , 1995, ArXiv.

[5]  Barbara Plank,et al.  Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) , 2010 .

[6]  Benoît Sagot,et al.  Building a morphological and syntactic lexicon by merging various linguistic resources , 2009, NODALIDA.

[7]  Juan Alberto Alonso,et al.  Machine translation for Catalan↔Spanish: the real case for productive MT , 2005, EAMT.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Montserrat Marimon,et al.  Towards the automatic merging of language resources , 2011 .

[10]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[11]  Montserrat Marimon,et al.  The Spanish Resource Grammar , 2010, LREC.

[12]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[13]  Claudia Soria,et al.  Multilingual resources for NLP in the lexical markup framework (LMF) , 2008, Lang. Resour. Evaluation.

[14]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[15]  Dekai Wu,et al.  Automatically Merging Lexicons that have Incompatible Part-of-Speech Categories , 1999, EMNLP.

[16]  Harry Bunt,et al.  Anatomy of Annotation Schemes: Mapping to GrAF , 2010, Linguistic Annotation Workshop.

[17]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18]  Marisa Ulivieri,et al.  Unified Lexicon and Unified Morphosyntactic Specifications for Written and Spoken Italian , 2006, LREC.

[19]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.