Automatic Irony Detection for Romanian Online News

In this paper, we propose a new corpus and approach to evaluate and detect the irony in online news published in Romanian. Irony detection is based on subjective perceptions, thus the human experience plays an important role in this field. We propose a supervised machine learning system based on a Romanian ironic and non-ironic news corpus, manually annotated, and a Romanian dictionary of commonly used words. We present a comparison of the best results by applying several classification algorithms, such as Naïve Bayes, Logistic Regression, Linear SVC, Decision Trees, and Random Forest. We present aspects about irony detection and some language resources, considering a Romanian corpus based on 25,841 non-ironic news items and 14,064 ironic news items collected from online sources, which is used by the proposed irony detection system. Furthermore, we present the results of a sentiment analysis system based on a supervised machine learning approach applied to the 8,128 news items representing our evaluation corpus. The scope of the paper is to present the proposed system and a practical solution for automatic irony detection in Romanian real-world applications, which can provide useful information about online media perception. The best result of the compared systems, after having processed the corpus and applied the filtering method is over 91 %, achieved by the Naïve Bayes algorithm.

[1]  Diana Maynard,et al.  Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis. , 2014, LREC.

[2]  Byron C. Wallace,et al.  Sparse, Contextually Informed Models for Irony Detection: Exploiting User Communities, Entities and Sentiment , 2015, ACL.

[3]  Scott Nowson,et al.  Verbal irony use in personal blogs , 2013, Behav. Inf. Technol..

[4]  A. Utsumi Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony☆ , 2000 .

[5]  S. Glucksberg,et al.  How about another piece of pie: the allusional pretense theory of discourse irony. , 1995, Journal of experimental psychology. General.

[6]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  Byron C. Wallace Computational irony: A survey and new perspectives , 2013, Artificial Intelligence Review.

[8]  Yanfen Hao,et al.  Support Structures for Linguistic Creativity: A Computational Analysis of Creative Irony in Similes , 2009 .

[9]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[10]  David Bamman,et al.  Contextualized Sarcasm Detection on Twitter , 2015, ICWSM.

[11]  E. Brown Irony , 1972, British journal of haematology.

[12]  Mário J. Silva,et al.  Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-) , 2009, TSA@CIKM.

[13]  Matthew A. Russell,et al.  Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More , 2018 .

[14]  Maral Dadvar,et al.  Experts and machines united against cyberbullying , 2014 .

[15]  Traian Rebedea,et al.  A Three Word-Level Approach Used in Machine Learning for Romanian Sentiment Analysis , 2019, 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet).

[16]  Laura Alba Juez,et al.  The evaluative palette of verbal irony , 2014 .

[17]  Paolo Rosso,et al.  A multidimensional approach for detecting irony in Twitter , 2013, Lang. Resour. Evaluation.

[18]  Tony Veale,et al.  Detecting Ironic Intent in Creative Comparisons , 2010, ECAI.

[19]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[21]  Alasdair MacGregor What is Socratic Irony , 2013 .

[22]  Pushpak Bhattacharyya,et al.  Automatic Sarcasm Detection: A Survey , 2016 .

[23]  Renxian Zhang,et al.  Recognizing Humor on Twitter , 2014, CIKM.

[24]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[25]  Elena Filatova,et al.  Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing , 2012, LREC.