Automatic detection of satire in Twitter: A psycholinguistic-based approach

In recent years, a substantial effort has been made to develop sophisticated methods that can be used to detect figurative language, and more specifically, irony and sarcasm. There is, however, an absence of new approaches and research works that analyze satirical texts. The recognition of satire by sentiment analysis and Natural Language Processing (NLP) applications is extremely important because it can influence and change the meaning of a statement in varied and complex ways. We used this understanding as a basis to propose a method that employs a wide variety of psycholinguistic features and which detects satirical and non-satirical text. We then went on to train a set of machine learning algorithms that would enable us to classify unknown data. Finally, we conducted several experiments in order to detect the most relevant features that generate a better pattern as regards detecting satirical texts. We evaluated the effectiveness of our method by obtaining a corpus of satirical and non-satirical news from Mexican and Spanish Twitter accounts. Our proposal obtained encouraging results, with an F-measure of 85.5% for Mexico and one of 84.0% for Spain. Moreover, the results of the experiment showed that there is no significant difference between Mexican and Spanish satire.

[1]  Rada Mihalcea,et al.  Linguistic Ethnography: Identifying Dominant Word Classes in Text , 2009, CICLing.

[2]  Ivan Koychev,et al.  Automatic Detection of Double Meaning in Texts from the Social Networks , 2015 .

[3]  Björn W. Schuller,et al.  New avenues in knowledge bases for natural language processing , 2016, Knowl. Based Syst..

[4]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[5]  Timothy Baldwin,et al.  Automatic Satire Detection: Are You Having a Laugh? , 2009, ACL.

[6]  Luis Alfonso Ureña López,et al.  Sentiment analysis in Twitter , 2012, Natural Language Engineering.

[7]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[8]  Ismail Hakki Toroslu,et al.  Transfer Learning Using Twitter Data for Improving Sentiment Classification of Turkish Political News , 2013, ISCIS.

[9]  R. Kreuz,et al.  Lexical Influences on the Perception of Sarcasm , 2007 .

[10]  Horacio Saggion,et al.  Do We Criticise (and Laugh) in the Same Way? Automatic Detection of Multi-Lingual Satirical News in Twitter , 2015, IJCAI.

[11]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[12]  M. Inés Torres,et al.  Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web , 2014, Knowl. Based Syst..

[13]  Fermín L. Cruz,et al.  A modular approach for lexical normalization applied to Spanish tweets , 2015, Expert Syst. Appl..

[14]  Tabrez Nafis,et al.  An Improved Method for Detection of Satire from User-Generated Content , 2015 .

[15]  Luis F. Chiroque,et al.  Graph-based Techniques for Topic Classification of Tweets in Spanish , 2014, Int. J. Interact. Multim. Artif. Intell..

[16]  Rodolfo Delmonte,et al.  Detecting Satire in Italian Political Commentaries , 2016, ICCCI.

[17]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[18]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[19]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[20]  R. Carter,et al.  “There's millions of them”: hyperbole in everyday conversation , 2004 .

[21]  Tanvir Ahmad,et al.  Satire Detection from Web Documents Using Machine Learning Methods , 2014, 2014 International Conference on Soft Computing and Machine Intelligence.

[22]  Li Yijinga,et al.  Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data , 2016 .

[23]  Nairán Ramírez-Esparza,et al.  LA PSICOLOGÍA DEL USO DE LAS PALABRAS: UN PROGRAMA DE COMPUTADORA QUE ANALIZA TEXTOS EN ESPAÑOL THE PSYCHOLOGY OF WORD USE: A COMPUTER PROGRAM THAT ANALYZES TEXTS IN SPANISH , 2007 .

[24]  Nairán Ramírez-Esparza,et al.  La psicología del uso de las palabras: Un programa de computadora que analiza textos en español , 2007 .

[25]  Antonio Fernández,et al.  Sentiment Analysis and Topic Detection of Spanish Tweets: A Comparative Study of of NLP Techniques , 2013, Proces. del Leng. Natural.

[26]  Vagelis Hristidis,et al.  Pharmaceutical drugs chatter on Online Social Networks , 2014, J. Biomed. Informatics.

[27]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[28]  Paolo Rosso,et al.  Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not , 2016, Knowl. Based Syst..

[29]  Jun Hong,et al.  Sarcasm Detection on Czech and English Twitter , 2014, COLING.

[30]  Tomoaki Ohtsuki,et al.  A Pattern-Based Approach for Sarcasm Detection on Twitter , 2016, IEEE Access.

[31]  Usman Qamar,et al.  SentiView: A visual sentiment analysis framework , 2014, International Conference on Information Society (i-Society 2014).

[32]  D. Watson,et al.  Development and validation of brief measures of positive and negative affect: the PANAS scales. , 1988, Journal of personality and social psychology.

[33]  Elena Filatova,et al.  Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing , 2012, LREC.

[34]  A. Katz,et al.  Are There Necessary Conditions for Inducing a Sense of Sarcastic Irony? , 2012 .

[35]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[36]  András Kornai,et al.  Leveraging the open source ispell codebase for minority language analysis , 2004 .

[37]  Joo-Hwee Lim,et al.  An ensemble classifier learning approach to ROC optimization , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[38]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[39]  Antonio Ruiz-Martínez,et al.  Feature-based opinion mining in financial news: An ontology-driven approach , 2017, J. Inf. Sci..

[40]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[41]  Cristina Bosco,et al.  Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT , 2013, IEEE Intelligent Systems.

[42]  Amit Ganatra,et al.  A Comparative Study of Training Algorithms for Supervised Machine Learning , 2012 .

[43]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[44]  Fatih Gurcan,et al.  A Hybrid Movie Recommender Using Dynamic Fuzzy Clustering , 2015, ISCIS.

[45]  Bashar Al Shboul,et al.  Multi-way sentiment classification of Arabic reviews , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[46]  Li Chen,et al.  Comparison of feature-level learning methods for mining online consumer reviews , 2012, Expert Syst. Appl..

[47]  Parham Moradi,et al.  Integration of graph clustering with ant colony optimization for feature selection , 2015, Knowl. Based Syst..

[48]  Funda Kivran-Swaine,et al.  Grief-Stricken in a Crowd: The Language of Bereavement and Distress in Social Media , 2012, ICWSM.

[49]  Horacio Saggion,et al.  Is this Tweet Satirical? A Computational Approach for Satire Detection in Spanish , 2015, Proces. del Leng. Natural.

[50]  A. B. M. Shawkat Ali,et al.  Computational intelligence for microarray data and biomedical image analysis for the early diagnosis of breast cancer , 2012, Expert Syst. Appl..

[51]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[52]  Miguel A. Alonso,et al.  Prototipado Rápido de un Sistema de Normalización de Tuits: Una Aproximación Léxica , 2013, Tweet-Norm@SEPLN.

[53]  Jori Lindley,et al.  Literal versus exaggerated "always" and "never": a cross-genre corpus study , 2016 .

[54]  Nathalie Aussenac-Gilles,et al.  Towards a Contextual Pragmatic Model to Detect Irony in Tweets , 2015, ACL.

[55]  Roger J. Kreuz,et al.  Distinguishing Sarcasm From Literal Language: Evidence From Books and Blogging , 2013 .

[56]  Erik Cambria,et al.  A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks , 2016, COLING.

[57]  Scott A. Crossley,et al.  A statistical analysis of satirical Amazon.com product reviews , 2014 .