Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue

The use of irony and sarcasm in social media allows us to study them at scale for the first time. However, their diversity has made it difficult to construct a high-quality corpus of sarcasm in dialogue. Here, we describe the process of creating a large- scale, highly-diverse corpus of online debate forums dialogue, and our novel methods for operationalizing classes of sarcasm in the form of rhetorical questions and hyperbole. We show that we can use lexico-syntactic cues to reliably retrieve sarcastic utterances with high accuracy. To demonstrate the properties and quality of our corpus, we conduct supervised learning experiments with simple features, and show that we achieve both higher precision and F than previous work on sarcasm in debate forums dialogue. We apply a weakly-supervised linguistic pattern learner and qualitatively analyze the linguistic differences in each class.

[1]  Davide Buscaldi,et al.  From humor recognition to irony detection: The figurative language of social media , 2012, Data Knowl. Eng..

[2]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[3]  Laura Cano Mora,et al.  ALL OR NOTHING: A SEMANTIC ANALYSIS OF HYPERBOLE , 2009 .

[4]  Julia Hirschberg,et al.  A theory of scalar implicature , 1985 .

[5]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[6]  C. Ilie What else can I tell you?: A pragmatic study of English rhetorical questions as discursive and argumentative acts , 1994 .

[7]  Brian Ecker,et al.  Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it , 2016, LREC.

[8]  Herbert L. Colston,et al.  Contrast and pragmatics in figurative language: Anything understatement can do, irony can do better , 2000 .

[9]  Chung-hye Han Deriving the Interpretation of Rhetorical Questions , 2005 .

[10]  R. Kreuz,et al.  Two Cues for Verbal Irony: Hyperbole and the Ironic Tone of Voice , 1995 .

[11]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[12]  Byron C. Wallace,et al.  Humans Require Context to Infer Ironic Intent (so Computers Probably do, too) , 2014, ACL.

[13]  D. Schaffer Can rhetorical questions function as retorts?: Is the Pope Catholic? , 2005 .

[14]  R. Gibbs Irony in Talk Among Friends , 2000 .

[15]  Marilyn A. Walker,et al.  A Corpus for Research on Deliberation and Debate , 2012, LREC.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[18]  Paolo Rosso,et al.  A multidimensional approach for detecting irony in Twitter , 2013, Lang. Resour. Evaluation.

[19]  Joonsuk Park,et al.  Automatic Identification of Rhetorical Questions , 2015, ACL.

[20]  Herbert L. Colston,et al.  You'll Never Believe This: Irony and Hyperbole in Expressing Surprise , 1998 .

[21]  Marilyn A. Walker,et al.  And That’s A Fact: Distinguishing Factual and Emotional Argumentation in Online Dialogue , 2015, ArgMining@HLT-NAACL.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Pushpak Bhattacharyya,et al.  Harnessing Context Incongruity for Sarcasm Detection , 2015, ACL.

[24]  Elena Filatova,et al.  Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing , 2012, LREC.

[25]  Marilyn A. Walker,et al.  Getting Reliable Annotations for Sarcasm in Online Dialogues , 2014, LREC.

[26]  M. Inés Torres,et al.  Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web , 2014, Knowl. Based Syst..

[27]  R. Carter,et al.  “There's millions of them”: hyperbole in everyday conversation , 2004 .

[28]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[29]  J. Frank You Call That a Rhetorical Question? Forms and Functions of Rhetorical Questions in Conversation , 1990 .

[30]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[31]  Marilyn A. Walker,et al.  That is your evidence?: Classifying stance in online political debate , 2012, Decis. Support Syst..

[32]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[33]  Marilyn A. Walker,et al.  Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue , 2013, ArXiv.

[34]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[35]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[36]  Alan Partington,et al.  Irony and reversal of evaluation , 2007 .