OCA: Opinion corpus for Arabic

Sentiment analysis is a challenging new task related to text mining and natural language processing. Although there are, at present, several studies related to this theme, most of these focus mainly on English texts. The resources available for opinion mining (OM) in other languages are still limited. In this article, we present a new Arabic corpus for the OM task that has been made available to the scientific community for research purposes. The corpus contains 500 movie reviews collected from different web pages and blogs in Arabic, 250 of them considered as positive reviews, and the other 250 as negative opinions. Furthermore, different experiments have been carried out on this corpus, using machine learning algorithms such as support vector machines and Nave Bayes. The results obtained are very promising and we are encouraged to continue this line of research. © 2011 Wiley Periodicals, Inc.

[1]  C. Osgood,et al.  The Measurement of Meaning , 1958 .

[2]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[3]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[4]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[5]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[6]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[7]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[10]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[11]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[12]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[13]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[14]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[15]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[16]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[17]  Rehab Duwairi,et al.  Machine learning for Arabic text categorization , 2006, J. Assoc. Inf. Sci. Technol..

[18]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[19]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[20]  Margaret E. Connell,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[21]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[22]  Khurshid Ahmad,et al.  Multi-lingual Sentiment Analysis of Financial News Streams , 2007 .

[23]  Bing Liu,et al.  The utility of linguistic rules in opinion mining , 2007, SIGIR.

[24]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[25]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[26]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[27]  Andrés Montoyo,et al.  EmotiBlog: an Annotation Scheme for Emotion Detection and Analysis in Non-traditional Textual Genres , 2009, DMIN.

[28]  Daniel Dajun Zeng,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009, J. Assoc. Inf. Sci. Technol..

[29]  Riyad Al-Shalabi,et al.  A comparison of text-classification techniques applied to Arabic text , 2009, J. Assoc. Inf. Sci. Technol..

[30]  Natheer Khasawneh,et al.  Feature reduction techniques for Arabic text categorization , 2009 .

[31]  Khaled Shaalan,et al.  NERA: Named Entity Recognition for Arabic , 2009 .

[32]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[33]  Andreas Nürnberger,et al.  Evaluation of n-gram conflation approaches for Arabic text retrieval , 2009, J. Assoc. Inf. Sci. Technol..

[34]  Nikola Ljubesic,et al.  Towards Sentiment Analysis of Financial Texts in Croatian , 2010, LREC.

[35]  David Jacot,et al.  Further Experiments in Sentiment Analysis of French Movie Reviews , 2011, AWIC.

[36]  David Jacot,et al.  Sentiment Analysis of French Movie Reviews , 2011, Advances in Distributed Agent-Based Retrieval Tools.