A Sentiment Classification in Bengali and Machine Translated English Corpus

The resource constraints in many languages have made the multi-lingual sentiment analysis approach a viable alternative for sentiment classification. Although a good amount of research has been conducted using a multi-lingual approach in languages like Chinese, Italian, Romanian, etc. very limited research has been done in Bengali. This paper presents a bilingual approach to sentiment analysis by comparing machine translated Bengali corpus to its original form. We apply multiple machine learning algorithms: Logistic Regression (LR), Ridge Regression (RR), Support Vector Machine (SVM), Random Forest (RF), Extra Randomized Trees (ET) and Long Short-Term Memory (LSTM) to a collection of Bengali corpus and corresponding machine translated English version. The results suggest that using machine translation improves classifiers performance in both datasets. Moreover, the results show that the unigram model performs better than higher-order n-gram model in both datasets due to linguistic variations and presence of misspelled words results from complex typing system of Bengali language; sparseness and noise in the machine translated data, and because of small datasets.

[1]  Pushpak Bhattacharyya,et al.  Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets , 2012, COLING.

[2]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[3]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Sanjida Akter,et al.  Sentiment analysis on facebook group using lexicon based approach , 2016, 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT).

[6]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[7]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[8]  Mohammed Eunus Ali,et al.  Detecting Multilabel Sentiment and Emotions from Bangla YouTube Comments , 2018 .

[9]  Md. Saiful Islam,et al.  Word embedding with hellinger PCA to detect the sentiment of bengali text , 2016, 2016 19th International Conference on Computer and Information Technology (ICCIT).

[10]  Yiran Chen,et al.  Quantitative Study of Individual Emotional States in Social Networks , 2012, IEEE Transactions on Affective Computing.

[11]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[12]  Nabeel Mohammed,et al.  Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models , 2016, ArXiv.

[13]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[14]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[15]  Nafis Irtiza Trinto,et al.  Detecting Multilabel Sentiment and Emotions from Bangla YouTube Comments , 2018, 2018 International Conference on Bangla Speech and Language Processing (ICBSLP).

[16]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[17]  Wasifa Chowdhury,et al.  Performing sentiment analysis in Bangla microblog posts , 2014, 2014 International Conference on Informatics, Electronics & Vision (ICIEV).

[18]  Xiaojun Wan,et al.  Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning , 2016, ACL.

[19]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[20]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[21]  Wu He,et al.  A bilingual approach for conducting Chinese and English social media sentiment analysis , 2014, Comput. Networks.

[22]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[24]  Nayan Banik,et al.  Evaluation of Naïve Bayes and Support Vector Machines on Bangla Textual Movie Reviews , 2018, 2018 International Conference on Bangla Speech and Language Processing (ICBSLP).

[25]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[26]  Md. Saiful Islam,et al.  Supervised approach of sentimentality extraction from bengali facebook status , 2016, 2016 19th International Conference on Computer and Information Technology (ICCIT).

[27]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[28]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[29]  Pintu Chandra Shill,et al.  Sentiment mining from Bangla data using mutual information , 2016, 2016 2nd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE).

[30]  Md. Al-Amin,et al.  Sentiment analysis of Bengali comments with Word2Vec and sentiment information of words , 2017, 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE).

[31]  Tirthankar Ghosal,et al.  Sentiment analysis on (Bengali horoscope) corpus , 2015, 2015 Annual IEEE India Conference (INDICON).

[32]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[33]  Md. Atikur Rahman,et al.  Datasets for Aspect-Based Sentiment Analysis in Bangla and Its Baseline Evaluation , 2018, Data.

[34]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[35]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[36]  Houfeng Wang,et al.  Cross-Lingual Mixture Model for Sentiment Classification , 2012, ACL.

[37]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[38]  Graeme Hirst,et al.  Cross-Lingual Sentiment Analysis Without (Good) Translation , 2017, IJCNLP.

[39]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.