Effectiveness of data-driven induction of semantic spaces and traditional classifiers for sarcasm detection

Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little work has been done for comparing different methods among the available corpora. Furthermore, usually, each author collects and uses their own datasets to evaluate his own method. In this paper, we show that sarcasm detection can be tackled by applying classical machine learning algorithms to input texts sub-symbolically represented in a Latent Semantic space. The main consequence is that our studies establish both reference datasets and baselines for the sarcasm detection problem that could serve the scientific community to test newly proposed methods.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Ipke Wachsmuth,et al.  Affective computing with primary and secondary emotions in a virtual human , 2009, Autonomous Agents and Multi-Agent Systems.

[3]  M. Inés Torres,et al.  Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web , 2014, Knowl. Based Syst..

[4]  Mariano Sigman,et al.  The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text , 2017, Consciousness and Cognition.

[5]  José-Ángel González,et al.  ELiRF-UPV at SemEval-2018 Tasks 1 and 3: Affect and Irony Detection in Tweets , 2018, *SEMEVAL.

[6]  Franco Chiavetta,et al.  A Lexicon-based Approach for Sentiment Classification of Amazon Books Reviews in Italian Language , 2016, WEBIST.

[7]  Viviana Patti,et al.  #NonDicevoSulSerio at SemEval-2018 Task 3: Exploiting Emojis and Affective Content for Irony Detection in English Tweets , 2018, *SEMEVAL.

[8]  Marilyn A. Walker,et al.  A Corpus for Research on Deliberation and Debate , 2012, LREC.

[9]  David Bamman,et al.  Contextualized Sarcasm Detection on Twitter , 2015, ICWSM.

[10]  Chuhan Wu,et al.  THU_NGN at SemEval-2018 Task 3: Tweet Irony Detection with Densely connected LSTM and Multi-task Learning , 2018, *SEMEVAL.

[11]  Marilyn A. Walker,et al.  Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue , 2016, SIGDIAL Conference.

[12]  Zhijian Wu,et al.  Twitter Sarcasm Detection Exploiting a Context-Based Model , 2015, WISE.

[13]  Giovanni Pilato,et al.  TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm , 2015, IEEE Transactions on Emerging Topics in Computing.

[14]  Erik Cambria,et al.  A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks , 2016, COLING.

[15]  C. V. Ramamoorthy,et al.  Phase Coherence in Conceptual Spaces for Conversational Agents , 2010 .

[16]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[17]  Ramón Fernández Astudillo,et al.  Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces , 2015, ACL.

[18]  Byron C. Wallace,et al.  Humans Require Context to Infer Ironic Intent (so Computers Probably do, too) , 2014, ACL.

[19]  Paolo Rosso,et al.  On the difficulty of automatically detecting irony: beyond a simple case of negation , 2014, Knowledge and Information Systems.

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[22]  Richard Evans,et al.  WLV at SemEval-2018 Task 3: Dissecting Tweets in Search of Irony , 2018, *SEMEVAL.

[23]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[24]  Ari Rappoport,et al.  ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[25]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[26]  E. Brown Irony , 1972, British journal of haematology.

[27]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[28]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[29]  Byron C. Wallace,et al.  Modelling Context with User Embeddings for Sarcasm Detection in Social Media , 2016, CoNLL.

[30]  Georgios Paraskevopoulos,et al.  NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs , 2018, *SEMEVAL.

[31]  Jerome R. Bellegarda,et al.  A multispan language modeling framework for large vocabulary speech recognition , 1998, IEEE Trans. Speech Audio Process..

[32]  Elena Filatova,et al.  Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing , 2012, LREC.

[33]  S. McDonald Exploring the Process of Inference Generation in Sarcasm: A Review of Normal and Clinical Studies , 1999, Brain and Language.

[34]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[35]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[36]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Anil Kumar Singh,et al.  NLPRL-IITBHU at SemEval-2018 Task 3: Combining Linguistic Features and Emoji pre-trained CNN for Irony Detection in Tweets , 2018, *SEMEVAL.

[39]  Po-Ya Angela Wang #Irony or #Sarcasm — A Quantitative and Qualitative Study Based on Twitter , 2013, PACLIC.

[40]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Paolo Rosso,et al.  A multidimensional approach for detecting irony in Twitter , 2013, Lang. Resour. Evaluation.

[43]  Mariano Sigman,et al.  Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database , 2016, ArXiv.

[44]  S. Attardo Irony as relevant inappropriateness , 2000 .

[45]  Tony Veale,et al.  Fracking Sarcasm using Neural Network , 2016, WASSA@NAACL-HLT.

[46]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[47]  Philipp Cimiano,et al.  An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews , 2014, WASSA@ACL.

[48]  M. Inés Torres,et al.  Combining Statistical and Semantic Knowledge for Sarcasm Detection in Online Dialogues , 2015, IbPRIA.

[49]  Roman Klinger,et al.  An Empirical, Quantitative Analysis of the Differences Between Sarcasm and Irony , 2016, ESWC.

[50]  Roi Reichart,et al.  Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation , 2017, ACL.

[51]  Pushpak Bhattacharyya,et al.  Are Word Embedding-based Features Useful for Sarcasm Detection? , 2016, EMNLP.

[52]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[53]  Ronaldo C. Prati,et al.  The Comprehension of Figurative Language: What Is the Influence of Irony and Sarcasm on NLP Techniques? , 2016, Sentiment Analysis and Ontology Engineering.

[54]  Paolo Rosso,et al.  Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not , 2016, Knowl. Based Syst..

[55]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[56]  Diana Maynard,et al.  Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis. , 2014, LREC.

[57]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[58]  Pushpak Bhattacharyya,et al.  Harnessing Context Incongruity for Sarcasm Detection , 2015, ACL.

[59]  Paolo Rosso,et al.  Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection , 2011, WASSA@ACL.

[60]  Jerome R. Bellegarda,et al.  Exploiting both local and global constraints for multi-span statistical language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[61]  Pushpak Bhattacharyya,et al.  Investigations in Computational Sarcasm , 2018, Cognitive Systems Monographs.

[62]  Véronique Hoste,et al.  SemEval-2018 Task 3: Irony Detection in English Tweets , 2018, *SEMEVAL.

[63]  T. Landauer Latent semantic analysis: A theory of the psychology of language and mind , 1999 .

[64]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[65]  Franco Chiavetta,et al.  A Layered Architecture for Sentiment Classification of Products Reviews in Italian Language , 2016, WEBIST.

[66]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[67]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[68]  William Rea,et al.  How Does Latent Semantic Analysis Work? A Visualisation Approach , 2014, ArXiv.

[69]  Paolo Rosso,et al.  Irony Detection in Twitter , 2016, ACM Trans. Internet Techn..

[70]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[71]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[72]  Juliano Efson Sales,et al.  A Compositional-Distributional Semantic Model for Searching Complex Entity Categories , 2016, *SEM@ACL.

[73]  Marilyn A. Walker,et al.  Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue , 2013, ArXiv.

[74]  Sanjay Kumar Jena,et al.  Parsing-based sarcasm sentiment recognition in Twitter data , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[75]  Brian Ecker,et al.  Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it , 2016, LREC.

[76]  R. Kreuz,et al.  How to be sarcastic: The echoic reminder theory of verbal irony. , 1989 .

[77]  Debanjan Ghosh,et al.  Sarcastic or Not: Word Embeddings to Predict the Literal or Sarcastic Meaning of Words , 2015, EMNLP.

[78]  Dai Quoc Nguyen,et al.  NIHRIO at SemEval-2018 Task 3: A Simple and Accurate Neural Network Model for Irony Detection in Twitter , 2018, *SEMEVAL.

[79]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[80]  Pushpak Bhattacharyya,et al.  Automatic Sarcasm Detection , 2016, ACM Comput. Surv..

[81]  Danielle S. McNamara,et al.  How Important Is Size? An Investigation of Corpus Size and Meaning in Both Latent Semantic Analysis and Latent Dirichlet Allocation , 2017, FLAIRS.

[82]  Sotiris B. Kotsiantis,et al.  Decision trees: a recent overview , 2011, Artificial Intelligence Review.