Sentiment Analysis of Code-Mixed Bambara-French Social Media Text Using Deep Learning Techniques

The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analysis on code-mixed Bambara-French Facebook comments. We develop four Long Short-term Memory (LSTM)-based models and two Convolutional Neural Network (CNN)-based models, and use these six models, Naïve Bayes, and Support Vector Machines (SVM) to conduct experiments on a constituted dataset. Social media text written in Bambara is scarce. To mitigate this weakness, this paper uses dictionaries of character and word indexes to produce character and word embedding in place of pre-trained word vectors. We investigate the effect of comment length on the models and perform a comparison among them. The best performing model is a one-layer CNN deep learning model with an accuracy of 83.23 %.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[4]  Shahaboddin Shamshirband,et al.  Evaluation of modulation transfer function of optical lens system by support vector regression methodologies A comparative study , 2014 .

[5]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Valentin Vydrin,et al.  Bamana Reference Corpus (BRC) , 2013 .

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Alessandro Moschitti,et al.  UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification , 2015, *SEMEVAL.

[12]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[13]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[14]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[15]  Aravind K. Joshi,et al.  Processing of Sentences With Intra-Sentential Code-Switching , 1982, COLING.

[16]  Gjorgji Madjarov,et al.  Twitter Sentiment Analysis Using Deep Convolutional Neural Network , 2015, HAIS.

[17]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[18]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[19]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[20]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[21]  Zheng Xiao,et al.  Chinese Sentiment Analysis Using Bidirectional LSTM with Word Embedding , 2016, ICCCS.

[22]  Xin Wang,et al.  Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory , 2015, ACL.

[23]  J. Gafaranga,et al.  Interactional otherness: Towards a redefinition of codeswitching , 2002 .

[24]  Laura Kallmeyer,et al.  Multilingual Code-switching Identification via LSTM Recurrent Neural Networks , 2016, CodeSwitch@EMNLP.

[25]  Pieter Muysken,et al.  Code-switching and grammatical theory , 1995, The Bilingualism Reader.

[26]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.