Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models

In this work we carried out an idiom type identification task on a set of 90 Italian V-NP and V-PP constructions comprising both idioms and non-idioms. Lexical variants were generated from these expressions by replacing their components with semantically related words extracted distributionally and from the Italian section of MultiWordNet. Idiomatic phrases turned out to be less similar to their lexical variants with respect to non-idiomatic ones in distributional semantic spaces. Different variant-based distributional measures of idiomaticity were tested. Our indices proved reliable in identifying also those idioms whose lexical variants are poorly or not at all attested in our corpus.

[1]  Cristina Cacciari,et al.  Understanding idiomatic expressions. The contribution of word meanings , 1991 .

[2]  Carlo Lapucci Dizionario dei modi di dire della lingua italiana , 1979 .

[3]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[4]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[5]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[6]  A. Goldberg Constructions: A Construction Grammar Approach to Argument Structure , 1995 .

[7]  Colin Bannard A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[8]  Cristina Cacciari,et al.  Semantic productivity and idiom comprehension. , 1994 .

[9]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[10]  P. Tichý Constructions , 1986, Philosophy of Science.

[11]  Magnus Sahlgren,et al.  The Distributional Hypothesis , 2008 .

[12]  Ernest Lepore,et al.  The compositionality papers , 2002 .

[13]  Cristina Cacciari,et al.  Processing multiword idiomatic strings: Many words in one? , 2014 .

[14]  Graeme Trousdale,et al.  The Oxford Handbook of Construction Grammar , 2013 .

[15]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[16]  A. Siyanova‐Chanturia,et al.  The idiom principle revisited , 2014 .

[17]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[18]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[19]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[20]  Dominic Widdows,et al.  Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns , 2005, ACL 2005.

[21]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[22]  Guy Aston,et al.  Introducing the La Repubblica Corpus: A Large, Annotated, TEI(XML)-compliant Corpus of Newspaper Italian , 2004, LREC.

[23]  Stefan Evert,et al.  A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection , 2014, TACL.

[24]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[25]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[26]  I. Sag,et al.  Idioms , 2015 .

[27]  R. Gibbs,et al.  Speakers' assumptions about the lexical flexibility of idioms , 1989, Memory & cognition.

[28]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[29]  Noam Chomsky,et al.  Rules and Representations , 1982 .

[30]  Filip Gralinski Mining the Web for Idiomatic Expressions Using Metalinguistic Markers , 2012, TSD.

[31]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[32]  R. Moon Fixed Expressions and Idioms in English: A Corpus-Based Approach , 1998 .

[33]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[34]  Afsaneh Fazly,et al.  A distributional account of the semantics of multiword expressions , 2008 .

[35]  Aravind K. Joshi,et al.  Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features , 2005, HLT.

[36]  J. Firth Papers in linguistics , 1958 .

[37]  Luke S. Zettlemoyer,et al.  Automatic Idiom Identification in Wiktionary , 2013, EMNLP.

[38]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .