Evaluating Distant Supervision for Subjectivity and Sentiment Analysis on Arabic Twitter Feeds

Supervised machine learning methods for automatic subjectivity and sentiment analysis (SSA) are problematic when applied to social media, such as Twitter, since they do not generalise well to unseen topics. A possible remedy of this problem is to apply distant supervision (DS) approaches, which learn from large amounts of automatically annotated data. This research empirically evaluates the performance of DS approaches for SSA on Arabic Twitter feeds. Results for emoticon- and lexiconbased DS show a significant performance gain over a fully supervised baseline, especially for detecting subjectivity, where we achieve 95.19% accuracy, which is a 48.47% absolute improvement over previous fully supervised results.

[1]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[2]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[3]  Ed H. Chi,et al.  Language Matters In Twitter: A Large Scale Study , 2011, ICWSM.

[4]  Stuart Adam Battersby,et al.  Experimenting with Distant Supervision for Emotion Classification , 2012, EACL.

[5]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[6]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[7]  Nancy Ide,et al.  Distant Supervision for Emotion Classification with Discrete Binary Values , 2013, CICLing.

[8]  Asad B. Sayeed,et al.  An opinion about opinions about opinions: subjectivity and the aggregate reader , 2013, NAACL.

[9]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[10]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[11]  Roxana Girju,et al.  YADAC: Yet another Dialectal Arabic Corpus , 2012, LREC.

[12]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[13]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[14]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[15]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[16]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[17]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[18]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[19]  Muhammad Abdul-Mageed,et al.  SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media , 2012, WASSA@ACL.

[20]  Matthew Purver,et al.  Predicting Emotion Labels for Chinese Microblog Texts , 2012, SDAD@ECML/PKDD.

[21]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[22]  Alecia Wolf,et al.  Emotional Expression Online: Gender Differences in Emoticon Use , 2000, Cyberpsychology Behav. Soc. Netw..

[23]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[24]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .