Learning English Light Verb Constructions: Contextual or Statistical

In this paper, we investigate a supervised machine learning framework for automatically learning of English Light Verb Constructions (LVCs). Our system achieves an 86.3% accuracy with a baseline (chance) performance of 52.2% when trained with groups of either contextual or statistical features. In addition, we present an in-depth analysis of these contextual and statistical features and show that the system trained by these two types of cosmetically different features reaches similar performance empirically. However, in the situation where the surface structures of candidate LVCs are identical, the system trained with contextual features which contain information on surrounding words performs 16.7% better. In this study, we also construct a balanced benchmark dataset with 2,162 sentences from BNC for English LVCs. And this data set is publicly available and is also a useful computational resource for research on MWEs in general.

[1]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2]  Anthony R. Davis,et al.  Diagnostics for Determining Compatibility in English Support-Verb-Nominalization Pairs , 2003, CICLing.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[5]  Otto Jespersen,et al.  Modern English Grammar , 1910, The School Review.

[6]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7]  Jon Oberlander,et al.  IN PROCEEDINGS OF EACL-2006 , 2006 .

[8]  Afsaneh Fazly,et al.  Automatically Distinguishing Literal and Figurative Usages of Highly Polysemous Verbs , 2005, ACL 2005.

[9]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[10]  Suzanne Stevenson,et al.  Distinguishing Subtypes of Multiword Expressions Using Linguistically-Motivated Statistical Measures , 2007 .

[11]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[12]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[13]  Ralph Grishman,et al.  Using NOMLEX to Produce Nominalization Patterns for Information Extraction , 1998, ACL 1998.

[14]  Carina Silberer,et al.  Proceedings of the International Conference on Language Resources and Evaluation (LREC) , 2008 .

[15]  Ray Cattell,et al.  ‘Light’ Verbs in English , 1984 .

[16]  Suzanne Stevenson,et al.  Statistical Measures of the Semi-Productivity of Light Verb Constructions , 2004 .

[17]  Simeon Potter,et al.  A Modern English Grammar on Historical Principles. Part V: Syntax, Fourth Volume@@@A Modern English Grammar on Historical Principles. Part VI: Morphology , 1947 .

[18]  O. Jespersen A modern English grammar on historical principles , 1928 .

[19]  Hang Cui,et al.  Extending corpus-based identification of light verb constructions using a supervised learning framework , 2006 .

[20]  Dan Roth,et al.  Learning Based Java for Rapid Development of NLP Systems , 2010, LREC.

[21]  Ottojespersen A Modern English Grammar On Historical Principles Part Vi , 1954 .

[22]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[23]  Hans C. Boas 8. Using FrameNet for the semantic analysis of German: Annotation, representation, and automation , 2009 .

[24]  Yiou Wang,et al.  Translation of the Light Verb Constructions in Japanese-Chinese Machine Translation , 2008 .

[25]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[26]  Kate Kearns,et al.  Light verbs in English , 2002 .

[27]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[28]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[29]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[30]  Aravind K. Joshi,et al.  Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features , 2005, HLT.

[31]  Tanja Samardžić,et al.  Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research , 2010 .

[32]  Miriam Butt The Light Verb Jungle , 2003 .