UDLex: Towards Cross-language Subcategorization Lexicons

This paper introduces UDLex, a computational framework for the automatic extraction of argument structures for several languages. By exploiting the versatility of the Universal Dependency annotation scheme, our system acquires subcategorization frames directly from a dependency parsed corpus, regardless of the input language. It thus uses a universal set of language-independent rules to detect verb dependencies in a sentence. In this paper we describe how the system has been developed by adapting the LexIt (Lenci et al., 2012) framework, originally designed to describe argument structures of Italian predicates. Practical issues that arose when building argument structure representations for typologically different languages will also be discussed.

[1]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[2]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[3]  Ted Briscoe,et al.  A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora , 2007, ACL.

[4]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[5]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[6]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[7]  Karin Kipper Schuler,et al.  Argument Realization , 2006, Comput. Linguistics.

[8]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[9]  Sabine Schulte im Walde A Subcategorisation Lexicon for German Verbs induced from a Lexicalised PCFG , 2002, LREC.

[10]  Alessandro Lenci,et al.  LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon , 2016, LREC.

[11]  Diana McCarthy,et al.  Lexical acquisition at the syntax-semantics interface : diathesis alternations, subcategorization frames and selectional preferences , 2001 .

[12]  Sabine Schulte im Walde 44. The induction of verb frames and verb classes from corpora , 2009 .

[13]  Thierry Poibeau Traitement automatique du contenu textuel , 2011 .

[14]  Joakim Nivre,et al.  Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[15]  Piet Mertens Restrictions de sélection et réalisations syntagmatiques dans DICOVALENCE: conversion vers un format utilisable en TAL , 2010 .

[16]  C. Fillmore,et al.  Toward a Frame-Based Lexicon: The semantics of Risk and its Neighbors , 2015 .

[17]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[18]  Sandra A. Thompson,et al.  Discourse Motivations for the Core-Oblique Distinction as a Language Universal , 1997 .

[19]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[20]  Anna Rumshisky,et al.  Resolving polysemy in verbs: Contextualized distributional approach to argument semantics , 2008 .

[21]  Stephen Wechsler,et al.  Lexicality and Argument Structure , 2015 .

[22]  Miriam R. L. Petruck FRAME SEMANTICS , 1996 .

[23]  Roser Morante,et al.  4LEX : a Multilingual Lexical resource , 2005 .

[24]  Ted Briscoe,et al.  A Large Subcategorization Lexicon for Natural Language Processing Applications , 2006, LREC.

[25]  Charles J. Fillmore,et al.  Frames and the semantics of understanding , 1985 .

[26]  Thierry Poibeau,et al.  LexSchem: a Large Subcategorization Lexicon for French Verbs , 2008, LREC.

[27]  Douglas Roland,et al.  Verb Sense and Verb Subcategorization Probabilities , 2001 .

[28]  Thierry Poibeau,et al.  Do we Still Need Gold Standards for Evaluation? , 2008, LREC.

[29]  김두식,et al.  English Verb Classes and Alternations , 2006 .

[30]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[31]  Guy Aston,et al.  Introducing the La Repubblica Corpus: A Large, Annotated, TEI(XML)-compliant Corpus of Newspaper Italian , 2004, LREC.

[32]  Katrin Erk,et al.  A Flexible, Corpus-Driven Model of Regular and Inverse Selectional Preferences , 2010, CL.

[33]  Thierry Poibeau,et al.  Acquisition de connaissances lexicales à partir de corpus : la sous-catégorisation verbale en français [Lexical acquisition from corpora: the case of subcategorization frames in French] , 2010, TAL.

[34]  Alessandro Lenci,et al.  LexIt: A Computational Resource on Italian Argument Structure , 2012, LREC.

[35]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[36]  Pierre Marchal Acquisition de schémas prédicatifs verbaux en japonais , 2015 .

[37]  Tiejun Zhao,et al.  Subcategorization Acquisition and Evaluation for Chinese Verbs , 2004, COLING.

[38]  Marc Light,et al.  Statistical models for the induction and use of selectional preferences , 2002, Cogn. Sci..

[39]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[40]  Fred Karlsson Finnish: An Essential Grammar , 1999 .

[41]  Anna Korhonen,et al.  Automatic Lexical Classification – Balancing between Machine Learning and Linguistics , 2009, PACLIC.

[42]  Montserrat Marimon,et al.  MultiVal - towards a multilingual valence lexicon , 2014, LREC.

[43]  Piet Mertens,et al.  La valence: l'approche pronominale et son application au lexique verbal , 2003 .