Experiments in Idiom Recognition

Some expressions can be ambiguous between idiomatic and literal interpretations depending on the context they occur in, e.g., ‘sales hit the roof’ vs. ‘hit the roof of the car’. We present a novel method of classifying whether a given instance is literal or idiomatic, focusing on verb-noun constructions. We report state-of-the-art results on this task using an approach based on the hypothesis that the distributions of the contexts of the idiomatic phrases will be different from the contexts of the literal usages. We measure contexts by using projections of the words into vector space. For comparison, we implement Fazly et al. (2009)’s, Sporleder and Li (2009)’s, and Li and Sporleder (2010b)’s methods and apply them to our data. We provide experimental results validating the proposed techniques.

[1]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[2]  Caroline Sporleder,et al.  Using Gaussian Mixture Models to Detect Figurative Language in Context , 2010, NAACL.

[3]  Ekaterina Vylomova,et al.  Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions , 2014, EMNLP.

[4]  Caroline Sporleder,et al.  Linguistic Cues for Distinguishing Literal and Non-Literal Usages , 2010, COLING.

[5]  Caroline Sporleder,et al.  Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[6]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[7]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[8]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[9]  I. R. McCaig,et al.  Oxford Dictionary of Current Idiomatic English , 1994 .

[10]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[11]  Paul M. B. Vitányi,et al.  Normalized Web Distance and Word Similarity , 2009, Handbook of Natural Language Processing.

[12]  I. Sag,et al.  Idioms , 2015 .

[13]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[14]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Anne Cutler,et al.  The access and processing of idiomatic expressions , 1979 .

[16]  Suzanne Stevenson,et al.  The VNC-Tokens Dataset , 2008 .

[17]  Andreas Langlotz,et al.  Idiomatic Creativity: A Cognitive-Linguistic Model of Idiom-Representation And Idiom-Variation in English , 2006 .

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  Caroline Sporleder,et al.  A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions , 2009, Graph-based Methods for Natural Language Processing.