SenZi: A Sentiment Analysis Lexicon for the Latinised Arabic (Arabizi)

Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi.

[1]  Natalie Sullivan Writing Arabizi: Orthographic Variation in Romanized Lebanese Arabic on Twitter , 2017 .

[2]  R. Bianchi 3arabizi - When Local Arabic Meets Global English , 2012 .

[3]  Taha Tobaili Arabizi Identification in Twitter Data , 2016, ACL.

[4]  Nizar Habash,et al.  Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus , 2014, ANLP@EMNLP.

[5]  Amir Hussain,et al.  Arabizi sentiment analysis based on transliteration and automatic corpus annotation , 2018, WASSA@EMNLP.

[6]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[7]  Izzat Alsmadi,et al.  An Opinion Analysis Tool for Colloquial and Standard Arabic , 2013 .

[8]  Kareem Darwish,et al.  Arabizi Detection and Conversion to Arabic , 2013, ANLP@EMNLP.

[9]  Hend Suliman Al-Khalifa,et al.  AraSenTi: Large-Scale Twitter-Specific Arabic Sentiment Lexicons , 2016, ACL.

[10]  Nizar Habash,et al.  Automatic Transliteration of Romanized Dialectal Arabic , 2014, CoNLL.

[11]  Rehab Duwairi,et al.  Arabic Sentiment Analysis Using Supervised Classification , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[12]  Izzat Alsmadi,et al.  Opinion Mining and Analysis for Arabic Language , 2014 .

[13]  Henry,et al.  Arabizi: An Analysis of the Romanization of the Arabic Script from a Sociolinguistic Perspective , 2014 .

[14]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[15]  Nizar Habash,et al.  Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script , 2014, CodeSwitch@EMNLP.

[16]  Mohammad Ali Yaghan,et al.  Arabizi: A Contemporary Style of Arabic Slang , 2008, Design Issues.

[17]  Bing Liu Sentiment Analysis and Opinion Mining Opinion Mining , 2011 .

[18]  Yuen Chee Keong,et al.  The use of Arabizi in English texting by Arab postgraduate students at UKM , 2015 .

[19]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[20]  M'hamed Mataoui,et al.  A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic , 2016, Res. Comput. Sci..

[21]  Rehab M. Duwairi,et al.  Sentiment analysis for Arabizi text , 2016, 2016 7th International Conference on Information and Communication Systems (ICICS).

[22]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[23]  Owen Rambow,et al.  SLSA: A Sentiment Lexicon for Standard Arabic , 2015, EMNLP.

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  Nizar Habash,et al.  A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining , 2014, ANLP@EMNLP.

[26]  Muhammad Abdul-Mageed,et al.  SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis , 2014, LREC.