MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis

Sentiment analysis (SA) and opinion mining (OM) becomes a field of interest that fueled the attention of research during the last decade, due to the rise of the amount of internet documents (especially online reviews and comments) on the social media such as blogs and social networks. Many attempts have been conducted to build a corpus for SA, due to the consideration of importance of building such resource as a key factor in SA and OM systems. But the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present MIKA a multi-genre tagged corpus of modern standard Arabic (MSA) and colloquial. MIKA is manually collected and annotated at sentence level with semantic orientation (positive or negative or neutral). A number of rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases and others are used for the annotation process. Our data focus on MSA and Egyptian dialectal Arabic. We report the efforts of manually building and annotating our sentiment corpus using different types of data, such as tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).

[1]  ThelwallMike,et al.  Sentiment strength detection in short informal text , 2010 .

[2]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[3]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[4]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[5]  A. Shoukry,et al.  Sentence-level Arabic sentiment analysis , 2012, 2012 International Conference on Collaboration Technologies and Systems (CTS).

[6]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[7]  Stefan Rass,et al.  On the Practical Feasibility of Secure Multipath Communication , 2013 .

[8]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[9]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire , 2011, Linguistic Annotation Workshop.

[10]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[11]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[12]  Jian Liu,et al.  Sentiment classification using phrase patterns , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[13]  Sherif Abdou,et al.  Sentiment Analysis For Modern Standard Arabic And Colloquial , 2015, ArXiv.

[14]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[15]  Bruno Pouliquen,et al.  Opinion Mining on Newspaper Quotations , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[16]  Alaa Hamouda,et al.  Sentiment Analyzer for Arabic Comments System , 2013 .

[17]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[18]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[19]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[20]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[21]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[22]  Roxana Girju,et al.  YADAC: Yet another Dialectal Arabic Corpus , 2012, LREC.

[23]  Aly A. Fahmy,et al.  A Machine Learning Approach For Opinion Holder Extraction In Arabic Language , 2012, ArXiv.

[24]  Slim Abdennadher,et al.  Survey on common Arabic language forms from a speech recognition point of view , 2009 .