NLP lexicons: innovative constructions and usages for machines and humans

Lexical resources have undergone significant changes with the generalized use of computers and the advent of the Internet. However, while such changes stand for revolutions when it comes to compare machine-readable dictionaries to their paper 'ancestors', machine-readable dictionaries, compiled for human readers, still have serious limitations. Natural language processing lexicons, initially developed for NLP applications, have shed light on some of such shortcomings. In this presentation, we will attempt to bring new elements relatively to NLP approaches aiming to develop present and tomorrow's lexical resources, in particular, using morphological and semantic information to better access lexical items. A special focus will be given on the semantic and on the multilingual side. Our argument is that nowadays lexical resources 1) should be useful both for men and machines, 2) can be constructed in alternative ways from classical lexicographic work, and 3) provide novel accesses and usages that are feasible only in the context of computer and user networks. Such points will be highlighted by means of two resources under development: LexRom, as an example of morphological form-based multilingual access, and the lexical network of JeuxDeMots, as an illustration of associative and semantic access.

[1]  G. Grefenstette The Future of Linguistics and Lexicographers: Will there be Lexicographers in the year 3000? , 1998 .

[2]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[3]  Antonio Zampolli,et al.  Computational approaches to the lexicon , 1994 .

[4]  Claudia Soria,et al.  Lexical Markup Framework (LMF) , 2006, LREC.

[5]  Udo Kruschwitz,et al.  ANAWIKI: Creating Anaphorically Annotated Resources through Web Cooperation , 2008, LREC.

[6]  Benoît Sagot,et al.  Morphology Based Automatic Acquisition of Large-coverage Lexica , 2004, LREC.

[7]  Michael Zock,et al.  A Tool for Linking Stems and Conceptual Fragments to Enhance word Access , 2010, LREC.

[8]  Rada Mihalcea,et al.  Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users’ Help , 2003, LINC@EACL.

[9]  Christiane Fellbaum,et al.  Morphosemantic links in WordNet , 2003 .

[10]  Michael Zock,et al.  How to Evaluate and Raise the Quality in a Collaborative Lexicographic Approach , 2008, LREC.

[11]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[12]  Adam Kilgarriff,et al.  Putting frequencies in the dictionary , 1997 .

[13]  Andrea Marchetti,et al.  SemKey: A Semantic Collaborative Tagging System , 2007 .

[14]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[15]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[16]  Christian Boitet,et al.  The PAPILLON Project: Cooperatively Building a Multilingual Lexical Data-base to Derive Open Source Dictionaries & Lexicons , 2002, NLPXML@COLING.

[17]  Jean Véronis,et al.  MACHINE READABLE DICTIONARIES: WHAT HAVE WE LEARNED, WHERE DO WE GO? , 1999 .

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Adam Kilgarriff,et al.  Fast Syntactic Searching in Very Large Corpora for Many Languages , 2010, PACLIC.

[20]  Nancy Ide,et al.  Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries , 1990, COLING.

[21]  Aldo Gangemi,et al.  Ontology Learning and Its Application to Automated Terminology Translation , 2003, IEEE Intell. Syst..

[22]  Mathieu Lafourcade,et al.  Making people play for Lexical Acquisition with the JeuxDeMots prototype , 2007 .

[23]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[24]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[25]  H. Lieberman Common Consensus : a web-based game for collecting commonsense goals , 2007 .