Natural language processing based features for sarcasm detection: An investigation using bilingual social media texts

The presence of sarcasm in text can hamper the performance of sentiment analysis. The challenge is to detect the existence of sarcasm in texts. This challenge is compounded when bilingual texts are considered, for example using Malay social media data. In this paper a feature extraction process is proposed to detect sarcasm using bilingual texts; more specifically public comments on economic related posts on Facebook. Four categories of feature that can be extracted using natural language processing are considered; lexical, pragmatic, prosodic and syntactic. We also investigated the use of idiosyncratic feature to capture the peculiar and odd comments found in a text. To determine the effectiveness of the proposed process, a non-linear Support Vector Machine was used to classify texts, in terms of the identified features, according to whether they included sarcastic content or not. The results obtained demonstrate that a combination of syntactic, pragmatic and prosodic features produced the best performance with an F-measure score of 0.852.

[1]  Davide Buscaldi,et al.  From humor recognition to irony detection: The figurative language of social media , 2012, Data Knowl. Eng..

[2]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[3]  Dimitris Spathis,et al.  A comparison between semi-supervised and supervised text mining techniques on detecting irony in greek political tweets , 2016, Eng. Appl. Artif. Intell..

[4]  Davide Castelvecchi,et al.  Deep learning boosts Google Translate tool , 2016, Nature.

[5]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[6]  Antal van den Bosch,et al.  The perfect solution for detecting sarcasm in tweets #not , 2013, WASSA@NAACL-HLT.

[7]  Rui Xia,et al.  Exploring the Use of Word Relation Features for Sentiment Classification , 2010, COLING.

[8]  R. Gibbs Irony in Talk Among Friends , 2000 .

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[10]  Ayu Purwarianti,et al.  Indonesian social media sentiment analysis with sarcasm detection , 2013, 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[11]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[12]  Otto Jespersen,et al.  The Philosophy of Grammar , 1924 .

[13]  Shourya Roy,et al.  How Much Noise Is Too Much: A Study in Automatic Text Classification , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14]  R. Kreuz,et al.  On Satire and Parody: The Importance of Being Ironic , 1993 .

[15]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[16]  Mário J. Silva,et al.  Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-) , 2009, TSA@CIKM.

[17]  Herbert L. Colston,et al.  Irony in Language and Thought : A Cognitive Science Reader , 2007 .

[18]  Tom Fawcett,et al.  Data science for business , 2013 .

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Roger J. Kreuz,et al.  Regional Variation in the Use of Sarcasm , 2008 .

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Jun Hong,et al.  Sarcasm Detection on Czech and English Twitter , 2014, COLING.