Developing Resources For Sentiment Analysis Of Informal Arabic Text In Social Media

Abstract Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons.

[1]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[2]  Mahmoud El-Haj,et al.  KALIMAT a multipurpose Arabic corpus , 2013 .

[3]  C. Anton Rytting,et al.  ArCADE: An Arabic Corpus of Auditory Dictation Errors , 2014, BEA@ACL.

[4]  S. Keerthana,et al.  Online Review Mining for Forecasting Sales , 2017 .

[5]  Hussein Suleman,et al.  Building a Multilingual and Mixed Arabic-English Corpus , 2011 .

[6]  Nihalahmad R. Shikalgar,et al.  ONLINE REVIEW MINING FOR FORECASTING SALES , 2013 .

[7]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[8]  Ying Li,et al.  Data mining and audience intelligence for advertising , 2007, SKDD.

[9]  Nizar Habash,et al.  A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining , 2014, ANLP@EMNLP.

[10]  Jun'ichi Tatemura Virtual reviewers for collaborative exploration of movie reviews , 2000, IUI '00.

[11]  Lucian Vlad Lita,et al.  Qualitative Dimensions in Question Answering: Extending the Definitional QA Task , 2005, AAAI.

[12]  Ahmed Abdelali,et al.  Building A Modern Standard Arabic Corpus , 2004 .

[13]  Nizar Habash,et al.  Morphological Annotation of Quranic Arabic , 2010, LREC.

[14]  Doaa Samy,et al.  Building a Parallel Multilingual Corpus (Arabic-Spanish-English) , 2006, LREC.

[15]  Eric Atwell,et al.  The design of a corpus of Contemporary Arabic , 2006 .

[16]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[17]  Gilad Mishne,et al.  Predicting Movie Sales from Blogger Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[18]  Xin Jin,et al.  Sensitive webpage classification for content advertising , 2007, ADKDD '07.

[19]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[20]  Samhaa R. El-Beltagy,et al.  Building Large Arabic Multi-domain Resources for Sentiment Analysis , 2015, CICLing.

[21]  Nelleke Oostdijk,et al.  Building a corpus of spoken Dutch , 1999, CLIN.

[22]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[23]  Robert E. Mercer,et al.  An automated method to build a corpus of rhetorically-classified sentences in biomedical texts , 2014, ArgMining@ACL.

[24]  Safa Ben Hamouda,et al.  Social Networks ’ Text Mining for Sentiment Classification : The case of Facebook ’ statuses updates in the “ Arabic Spring ” Era , 2013 .

[25]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[26]  Motaz Saad,et al.  OSAC: Open Source Arabic Corpora , 2010 .

[27]  Sattar Izwaini Building specialised corpora for translation studies , 2003 .

[28]  Anna Sågvall Hein,et al.  Building a Swedish-Turkish Parallel Corpus , 2006, LREC.

[29]  Kareem Darwish,et al.  Using Twitter to Collect a Multi-Dialectal Corpus of Arabic , 2014, ANLP@EMNLP.

[30]  Malek Hajjem,et al.  Building comparable corpora from social networks , 2014 .

[31]  Claire Cardie,et al.  Multi-Perspective Question Answering Using the OpQA Corpus , 2005, HLT.

[32]  Alaa M. El-Halees,et al.  Arabic Opinion Mining Using Combined Classification Approach , 2011 .

[33]  Swapna Somasundaran,et al.  QA with Attitude: Exploiting Opinion Type Analysis for Improving Question Answering in On-line Discussions and the News , 2007, ICWSM.

[34]  Muhammad Abdul-Mageed,et al.  SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis , 2014, LREC.

[35]  Muhammad Abdul-Mageed,et al.  SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media , 2012, WASSA@ACL.

[36]  Hazem M. Hajj,et al.  Sentence-Level and Document-Level Sentiment Mining for Arabic Texts , 2010, 2010 IEEE International Conference on Data Mining Workshops.