Sentiment Classification in Swahili Language Using Multilingual BERT

The evolution of the Internet has increased the amount of information that is expressed by people on different platforms. This information can be product reviews, discussions on forums, or social media platforms. Accessibility of these opinions and people’s feelings open the door to opinion mining and sentiment analysis. As language and speech technologies become more advanced, many languages have been used and the best models have been obtained. However, due to linguistic diversity and lack of datasets, African languages have been left behind. In this study, by using the current state of the art model, multilingual BERT, we perform sentiment classification on Swahili datasets. The data was created by extracting and annotating 8.2k reviews and comments on different social media platforms and ISEAR emotion dataset. The data were classified as either positive or negative. The model was fine-tuned and achieve the best accuracy of 87.59%.

[1]  Qing Zhu,et al.  COVID-19 Sensing: Negative Sentiment Analysis on Social Media in China via BERT Model , 2020, IEEE Access.

[2]  Sachin N. Deshmukh,et al.  Sentiment Analysis on Product Reviews Using Machine Learning Techniques , 2018, Cognitive Informatics and Soft Computing.

[3]  Md. Mokhlesur Rahman,et al.  COVID-19 Public Sentiment Insights and MachineLearning for Tweets Classification , 2020, medRxiv.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Young-Seob Jeong,et al.  Sentiment Classification Using Convolutional Neural Networks , 2019, Applied Sciences.

[6]  Lin Li,et al.  How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis , 2018, Neural Computing and Applications.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[10]  Debi Prosad Dogra,et al.  Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning , 2020, Electronics.

[11]  Hatem Haddad,et al.  Learning Word Representations for Tunisian Sentiment Analysis , 2020, MedPRAI.

[12]  Fernando de la Prieta,et al.  Sentiment Analysis Based on Deep Learning: A Comparative Study , 2020, Electronics.

[13]  Charibeth Cheng,et al.  Localization of Fake News Detection via Multitask Transfer Learning , 2019, LREC.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Chien-Cheng Lee,et al.  BERT-Based Stock Market Sentiment Analysis , 2020, 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan).

[16]  Lori Pollock,et al.  Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.