Brighter than Gold: Figurative Language in User Generated Comparisons

Comparisons are common linguistic devices used to indicate the likeness of two things. Often, this likeness is not meant in the literal sense—for example, “I slept like a log” does not imply that logs actually sleep. In this paper we propose a computational study of figurative comparisons, or similes. Our starting point is a new large dataset of comparisons extracted from product reviews and annotated for figurativeness. We use this dataset to characterize figurative language in naturally occurring comparisons and reveal linguistic patterns indicative of this phenomenon. We operationalize these insights and apply them to a new task with high relevance to text understanding: distinguishing between figurative and literal comparisons. Finally, we apply this framework to explore the social context in which figurative language is produced, showing that similes are more likely to accompany opinions showing extreme sentiment, and that they are uncommon in reviews deemed helpful.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Bin Li,et al.  Web Based Collection and Comparison of Cognitive Properties in English and Chinese , 2012, AKBC-WEKEX@NAACL-HLT.

[3]  Tony Veale,et al.  Exploiting Readymades in Linguistic Creativity: A System Demonstration of the Jigsaw Bard , 2011, ACL.

[4]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[5]  Rosamund Moon,et al.  Simile and dissimilarity , 2011 .

[6]  Vlad Niculae,et al.  Comparison pattern matching and creative simile recognition , 2013, JSSP.

[7]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8]  Yulia Tsvetkov,et al.  Metaphor Detection with Cross-Lingual Model Transfer , 2014, ACL.

[9]  Tony Veale,et al.  Creating Similarity: Lateral Thinking for Vertical Similarity Judgments , 2013, ACL.

[10]  Eduard Hovy,et al.  Identifying Metaphorical Word Use with Tree Kernels , 2013 .

[11]  Lin Sun,et al.  Unsupervised Metaphor Identification Using Hierarchical Graph Factorization Clustering , 2013, NAACL.

[12]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[13]  Rosamund Moon,et al.  Conventionalized as -similes in English: A problem case , 2008 .

[14]  Gerhard Weikum,et al.  Acquiring Comparative Commonsense Knowledge from the Web , 2014, AAAI.

[15]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[16]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[17]  Tony Veale,et al.  A Computational Exploration of Creative Similes , 2012 .

[18]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[19]  Craig H. Martell,et al.  Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).

[20]  Patrick Hanks,et al.  Lexical Analysis: Norms and Exploitations , 2013 .

[21]  Hugh Bredin Comparisons and similes , 1998 .

[22]  JENNIFER RIDDLE HARDING,et al.  On Simile , 2004 .

[23]  Ekaterina Shutova,et al.  Models of Metaphor in NLP , 2010, ACL.

[24]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[25]  John M Kennedy,et al.  Similes on the Internet have explanations , 2006, Psychonomic bulletin & review.

[26]  Yair Neuman,et al.  Literal and Metaphorical Sense Identification through Concrete and Abstract Context , 2011, EMNLP.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[29]  Victor Kuperman,et al.  Crowdsourcing and language studies: the new generation of linguistic data , 2010, Mturk@HLT-NAACL.

[30]  Stefan Th. Gries,et al.  Metaphoricity is gradable , 2006 .

[31]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[32]  Yulia Tsvetkov,et al.  Augmenting English Adjective Senses with Supersenses , 2014, LREC.

[33]  L. Bethlehem,et al.  Simile and Figurative Language , 1996 .

[34]  A. Paivio,et al.  Concreteness, imagery, and meaningfulness values for 925 nouns. , 1968, Journal of experimental psychology.

[35]  Christoph Lofi,et al.  Discriminating Rhetorical Analogies in Social Media , 2014, EACL.

[36]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[37]  Tony Veale,et al.  A context-sensitive framework for lexical ontologies , 2008, The Knowledge Engineering Review.

[38]  Michel Achard,et al.  Language, culture and mind , 2004 .

[39]  Christiane Fellbaum,et al.  Obituary: George A. Miller , 2013, CL.

[40]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[41]  Tony Veale,et al.  A Context-sensitive, Multi-faceted Model of Lexico-Conceptual Affect , 2012, ACL.