Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions

We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to tell apart literal and non-literal usages, even for idioms which occur in canonical form.

[1]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[2]  Michael Halliday,et al.  Cohesion in English , 1976 .

[3]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[4]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[5]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[7]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[8]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[9]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[10]  J. Ayto,et al.  Oxford Dictionary of English Idioms , 2010 .

[11]  Carlos Ramisch,et al.  Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering , 2007, EMNLP.

[12]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[13]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[15]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[16]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[17]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[18]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[19]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[20]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[21]  Susanne Z. Riehemann,et al.  A constructional approach to idioms and word formation , 2001 .

[22]  Graeme Hirst,et al.  Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[23]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[24]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[25]  Colin Bannard A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[26]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.