Modeling the Statistical Idiosyncrasy of Multiword Expressions

The focus of this work is statistical idiosyncrasy (or collocational weight) as a discriminant property of multiword expressions. We formalize and model this property, compile a 2-class data set of MWE and non-MWE examples, and evaluate our models on this data set. We present a possible empirical implementation of collocational weight and study its effects on identification and extraction of MWEs. Our models prove to be more effective than baselines in identifying noun-noun MWEs.

[1]  Timothy Baldwin,et al.  Deep lexical acquisition of verb-particle constructions , 2005, Comput. Speech Lang..

[2]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  Carlos Ramisch,et al.  Alignment-based extraction of multiword expressions , 2010, Lang. Resour. Evaluation.

[5]  SmadjaFrank Retrieving collocations from text , 1993 .

[6]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[7]  Eric Wehrli,et al.  Accurate Collocation Extraction Using a Multilingual Parser , 2006, ACL.

[8]  Eric Wehrli,et al.  Creating a multilingual collocations dictionary from large text corpora , 2003, EACL.

[9]  Nizar Y. Habash,et al.  Handbook of Natural Language Processing, Second Edition , 2010 .

[10]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[11]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[12]  Tony McEnery,et al.  Multi-word unit alignment in English-Chinese parallel corpora , 2001 .

[13]  Carlos Ramisch,et al.  A Generic Framework for Multiword Expressions Treatment: from Acquisition to Applications , 2012, ACL 2012.

[14]  M. Yüksel,et al.  A Ph.D. Thesis , 2014 .

[15]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[16]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.

[17]  Aaron Smith Breaking Bad: Extraction of Verb-Particle Constructions from a Parallel Subtitles Corpus , 2014, MWE@EACL.

[18]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[19]  Carlos Ramisch,et al.  Picking them up and Figuring them out: Verb-Particle Constructions, Noise and Idiomaticity , 2008, CoNLL.

[20]  Jörg Tiedemann,et al.  Identifying idiomatic expressions using automatic word-alignment , 2006 .

[21]  Mirella Lapata,et al.  Detecting Novel Compounds: The Role of Distributional Evidence , 2003, EACL.

[22]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[23]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[24]  Stephen G. Pulman,et al.  An Unsupervised Ranking Model for Noun-Noun Compositionality , 2012, *SEM@NAACL-HLT.

[25]  Timothy Baldwin,et al.  Extracting the Unextractable: A Case Study on Verb-particles , 2002, CoNLL.

[26]  Joakim Nivre,et al.  Multiword Units in Syntactic Parsing , 2004 .

[27]  Meghdad Farahmand,et al.  A Supervised Model for Extraction of Multiword Expressions, Based on Statistical Context Features , 2014, MWE@EACL.

[28]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[29]  Gaël Dias,et al.  Multiword Unit Hybrid Extraction , 2003, ACL 2003.

[30]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.