A study of the performance of embedding methods for Arabic short-text sentiment analysis using deep learning approaches

Abstract Sentiment analysis aims to classify a text according to sentimental polarities of people’s opinions, such as positive, negative, or neutral. While most of the studies focus on eliciting features from English text, the research on Arabic is limited due to the morphological and grammatical complexity of Arabic language. In this paper, we proposed an optimized sentiment classification for dialectal Arabic short text at the document level using deep learning (DL). The contributions of this paper are in three areas. First, we extracted semantic features for Arabic short text at the word level and character level. Second, we used three DL topologies for classification models: a long short-term memory recurrent neural network (LSTM); a convolutional neural network (CNN); and an ensemble model combining both models’ advantages to improve the prediction performance. Third, we used a hyperparameter tuning estimation method to optimize the neural network performance. We trained and tested our proposed models on a dataset that consists of Modern Standard Arabic and dialectal Arabic corpus collected from Twitter. The results showed significant improvement in Arabic text classification in term of classification accuracy that ranges between 88% and 69.7%. The ensemble model scored the highest accuracy of 96.7% on the test set.

[1]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[2]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[3]  Kaushik Roy,et al.  Comparison of Pre-Trained Word Vectors for Arabic Text Classification Using Deep Learning Approach , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Lixin Tao,et al.  Word embeddings for Arabic sentiment analysis , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[7]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[10]  Walid Magdy,et al.  Mazajak: An Online Arabic Sentiment Analyser , 2019, WANLP@ACL 2019.

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[13]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[14]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[15]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[16]  Hong Liang,et al.  Text feature extraction based on deep learning: a review , 2017, EURASIP Journal on Wireless Communications and Networking.

[17]  Matthew England,et al.  A Combined CNN and LSTM Model for Arabic Sentiment Analysis , 2018, CD-MAKE.

[18]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[19]  Hend Suliman Al-Khalifa,et al.  AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets , 2017, ACLING.

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Nagwa M. El-Makky,et al.  Sentiment Analysis of Arabic Tweets using Deep Learning , 2018, ACLING.

[22]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[23]  Abed Allah Khamaiseh,et al.  A comprehensive survey of arabic sentiment analysis , 2019, Inf. Process. Manag..

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[26]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[27]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[28]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.