DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter

Identifying irony in user-generated social media content has a wide range of applications; however to date Arabic content has received limited attention. To bridge this gap, this study builds a new open domain Arabic corpus annotated for irony detection. We query Twitter using irony-related hashtags to collect ironic messages, which are then manually annotated by two linguists according to our working definition of irony. Challenges which we have encountered during the annotation process reflect the inherent limitations of Twitter messages interpretation, as well as the complexity of Arabic and its dialects. Once published, our corpus will be a valuable free resource for developing open domain systems for automatic irony recognition in Arabic language and its dialects in social media text.

[1]  Paolo Rosso,et al.  Mining Subjective Knowledge from Customer Reviews: A Specific Case of Irony Detection , 2011, WASSA@ACL.

[2]  Ofer Fein,et al.  Negation Generates Nonliteral Interpretations by Default , 2013 .

[3]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[4]  Kemal Oflazer,et al.  A Pilot Study on Arabic Multi-Genre Corpus Diacritization , 2015, ANLP@ACL.

[5]  Preslav Nakov,et al.  Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness , 2018, CLEF.

[6]  S. Attardo Irony as relevant inappropriateness , 2000 .

[7]  Deirdre Wilson,et al.  On verbal irony , 1992 .

[8]  Antal van den Bosch,et al.  Signaling sarcasm: From hyperbole to hashtag , 2015, Inf. Process. Manag..

[9]  Paolo Rosso,et al.  Overview of the Track on Author Profiling and Deception Detection in Arabic , 2019, FIRE.

[10]  Cynthia Van Hee Can machines sense irony? : exploring automatic irony detection on social media , 2017 .

[11]  Cristina Bosco,et al.  TWITTIRÒ: a Social Media Corpus with a Multi-layered Annotation for Irony , 2017, CLiC-it.

[12]  Pushpak Bhattacharyya,et al.  Automatic Sarcasm Detection , 2016, ACM Comput. Surv..

[13]  Shu-Kai Hsieh,et al.  Sarcasm Detection in Chinese Using a Crowdsourced Corpus , 2016, ROCLING.

[14]  Paolo Rosso,et al.  A survey on author profiling, deception, and irony detection for the Arabic language , 2018, Lang. Linguistics Compass.

[15]  Els Lefever,et al.  Guidelines for Annotating Irony in Social Media Text, version 2.0 , 2016 .

[16]  S. Glucksberg,et al.  How about another piece of pie: the allusional pretense theory of discourse irony. , 1995, Journal of experimental psychology. General.

[17]  Kemal Oflazer,et al.  Large Scale Arabic Error Annotation: Guidelines and Framework , 2014, LREC.

[18]  Preslav Nakov,et al.  Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 2: Factuality , 2018, CLEF.

[19]  Delia Irazú Hernández Farías,et al.  Irony and Sarcasm Detection in Twitter: The Role of Affective Content , 2019, Proces. del Leng. Natural.

[20]  Elena Filatova,et al.  Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing , 2012, LREC.

[21]  Paolo Rosso,et al.  Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA) , 2018, EVALITA@CLiC-it.

[22]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[23]  Farah Benamara,et al.  SOUKHRIA: Towards an Irony Detection System for Arabic in Social Media , 2017, ACLING.

[24]  Seth Kulick,et al.  From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News , 2010, LREC.

[25]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[26]  Nathalie Aussenac-Gilles,et al.  Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study , 2017, EACL.

[27]  Paolo Rosso,et al.  Detecting Deceptive Tweets in Arabic for Cyber-Security , 2019, 2019 IEEE International Conference on Intelligence and Security Informatics (ISI).

[28]  Hsin-Hsi Chen,et al.  Chinese Irony Corpus Construction and Ironic Structure Analysis , 2014, COLING.

[29]  Wajdi Zaghouani Critical Survey of the Freely Available Arabic Corpora , 2017, ArXiv.

[30]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[31]  Wajdi Zaghouani,et al.  Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification , 2018, LREC.

[32]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[33]  Joan Lucariello Situational irony: A concept of events gone awry. , 1994 .

[34]  Kemal Oflazer,et al.  The MADAR Arabic Dialect Corpus and Lexicon , 2018, LREC.

[35]  Marilyn A. Walker,et al.  A Corpus for Research on Deliberation and Debate , 2012, LREC.

[36]  Siobhan Chapman Logic and Conversation , 2005 .