Modeling Arabic subjectivity and sentiment in lexical space

Abstract In spite of the vast amount of work on subjectivity and sentiment analysis (SSA), it is not yet particularly clear how lexical information can best be modeled in a morphologically-richness language. To bridge this gap, we report successful models targeting lexical input in Arabic, a language of very complex morphology. Namely, we measure the impact of both gold and automatic segmentation on the task and build effective models achieving significantly higher than our baselines. Our models exploiting predicted segments improve subjectivity classification by 6.02% F1-measure and sentiment classification by 4.50% F1-measure against the majority class baseline surface word forms. We also perform in-depth (error) analyses of the behavior of the models and provide detailed explanations of subjectivity and sentiment expression in Arabic against the morphological richness background in which the work is situated.

[1]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[2]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[6]  Mahmoud Al-Ayyoub,et al.  Enhancing the determination of aspect categories and their polarities in Arabic reviews using lexicon-based approaches , 2015, 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[7]  K. Versteegh The Arabic Language , 1997 .

[8]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[9]  Lamia Hadrich Belguith,et al.  Sentiment Classification of Arabic Documents: Experiments with multi-type features and ensemble algorithms , 2015, PACLIC.

[10]  Nizar Habash,et al.  Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features , 2010, SPMRL@NAACL-HLT.

[11]  Otakar Smrž Functional Arabic Morphology: Formal System and Implementation , 2007 .

[12]  Sherif Abdou,et al.  MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[13]  Muhammad Abdul-Mageed Online News Sites and Journalism 2.0: Reader Comments on Al Jazeera Arabic , 2008 .

[14]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[15]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[16]  Clive Holes,et al.  Modern Arabic: Structures, Functions, and Varieties , 1996 .

[17]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[18]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[19]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[20]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[21]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[22]  Mary Catherine Bateson,et al.  Arabic Language Handbook , 1967 .

[23]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[24]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[25]  Daisuke Ikeda,et al.  Learning to Shift the Polarity of Words for Sentiment Classification , 2008, IJCNLP.

[26]  Verónica Pérez-Rosas,et al.  Learning Sentiment Lexicons in Spanish , 2012, LREC.

[27]  Karin C. Ryding,et al.  A Reference Grammar of Modern Standard Arabic , 2005 .

[28]  Hyopil Shin,et al.  Language-Specific Sentiment Analysis in Morphologically Rich Languages , 2010, COLING.

[29]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[30]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[31]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[32]  Elisabetta Fersini,et al.  Expressive signals in social media languages to improve polarity detection , 2016, Inf. Process. Manag..

[33]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[34]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[35]  Mahmoud Al-Ayyoub,et al.  An analytical study of Arabic sentiments: Maktoob case study , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[36]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[37]  Soo-Min Kim,et al.  Crystal: Analyzing Predictive Opinions on the Web , 2007, EMNLP.

[38]  Nizar Habash,et al.  A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining , 2014, ANLP@EMNLP.

[39]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[40]  S. R. El-Beltagy,et al.  Open issues in the sentiment analysis of Arabic social media: A case study , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).

[41]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[42]  Kareem Darwish,et al.  Building a Shallow Arabic Morphological Analyser in One Day , 2002, SEMITIC@ACL.

[43]  Amir F. Atiya,et al.  LABR: A Large Scale Arabic Book Reviews Dataset , 2013, ACL.

[44]  Ann Banfield,et al.  Unspeakable Sentences : Narration and Representation in the Language of Fiction , 1982 .

[45]  Mahmoud Al-Ayyoub,et al.  Lexicon-based sentiment analysis of Arabic tweets , 2015, Int. J. Soc. Netw. Min..

[46]  Hend Suliman Al-Khalifa,et al.  AraSenTi: Large-Scale Twitter-Specific Arabic Sentiment Lexicons , 2016, ACL.

[47]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[48]  Christiane Fellbaum,et al.  Introducing the Arabic WordNet project , 2006 .

[49]  Verena Rieser,et al.  iLab-Edinburgh at SemEval-2016 Task 7: A Hybrid Approach for Determining Sentiment Intensity of Arabic Twitter Phrases , 2016, *SEMEVAL.

[50]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[51]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[52]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[53]  John Lyons,et al.  Linguistic Semantics: An Introduction , 1995 .

[54]  Yong Qi,et al.  Information Processing and Management , 1984 .

[55]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[56]  Muhammad Abdul-Mageed,et al.  ASMA: A System for Automatic Segmentation and Morpho-Syntactic Disambiguation of Modern Standard Arabic , 2013, RANLP.

[57]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[58]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[59]  Mukesh A. Zaveri,et al.  Opinion Mining from Online User Reviews Using Fuzzy Linguistic Hedges , 2014, Appl. Comput. Intell. Soft Comput..

[60]  H. Palva,et al.  Patterns of Koineization in Modern Colloquial Arabic , 1982 .

[61]  Kadri Hacioglu,et al.  Automatic Processing of Modern Standard Arabic Text , 2007 .

[62]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[63]  Owen Rambow,et al.  SLSA: A Sentiment Lexicon for Standard Arabic , 2015, EMNLP.

[64]  Verena Rieser,et al.  An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis , 2014, LREC.

[65]  Gregory Grefenstette,et al.  Validating the Coverage of Lexical Resources for Affect Analysis and Automatically Classifying New Words along Semantic Axes , 2006, Computing Attitude and Affect in Text.

[66]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire , 2011, Linguistic Annotation Workshop.

[67]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[68]  Samhaa R. El-Beltagy,et al.  Building Large Arabic Multi-domain Resources for Sentiment Analysis , 2015, CICLing.

[69]  Bashar Al Shboul,et al.  Multi-way sentiment classification of Arabic reviews , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[70]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[71]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[72]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[73]  Muhammad Abdul-Mageed,et al.  SAMAR: Subjectivity and sentiment analysis for Arabic social media , 2014, Comput. Speech Lang..

[74]  Mahmoud Al-Ayyoub,et al.  Cross-Lingual Short-Text Document Classification for Facebook Comments , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[75]  Andrea Esuli,et al.  PageRanking WordNet Synsets: An Application to Opinion Mining , 2007, ACL.

[76]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[77]  Muhammad Abdul-Mageed,et al.  SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media , 2012, WASSA@ACL.

[78]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[79]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[80]  J. Kamps,et al.  Words with attitude , 2002 .

[81]  Ossama Emam,et al.  Language Model Based Arabic Word Segmentation , 2003, ACL.

[82]  Nizar Habash,et al.  Parsing Arabic Dialects , 2006, EACL.

[83]  Paolo Rosso,et al.  Emotion and sentiment in social and expressive media: Introduction to the special issue , 2016, Inf. Process. Manag..

[84]  Ari Rappoport,et al.  ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[85]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..