Analyzing Performance of Different Machine Learning Approaches With Doc2vec for Classifying Sentiment of Bengali Natural Language

Vector or numeric representation of text documents has been a revolution in natural language processing as it represents similar parts of text in such a way that they are very close to each other, making it very easy to classify or find similarities among them. These vectors also represent the way we use the words or parts of documents as well which helps finding similarity even between pair of words. While word2vec is such a technique that represents each word as a vector, doc2vec takes it to another level by representing a whole sentence or document as a vector. Being able to represent an entire document as a vector allows comparing a substantial number of words or sentences at a time which can save computational power as well as bandwidth. This relatively newer doc2vec technology has not yet been implemented for Bengali sentiment analysis and its feasibility is also unknown. In this study, we have trained a doc2vec model using a corpus constructed with 7,000 Bengali sentences. The model consists of two types of data differentiated by their polarity i.e. positive and negative. Later, we have employed several machine learning algorithms for comparing the accuracy of classification among which Bi-Directional Long Short-Term Memory (BLSTM) has obtained the highest accuracy of 77.85% along with precision, recall and F-1 score of 78.06%,77.39% and 77.72% respectively.

[1]  Lijun Liu,et al.  Sentiment Analysis Using Convolutional Neural Network , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[2]  Panos Panagiotopoulos,et al.  Beyond positive or negative: Qualitative sentiment analysis of social media reactions to unexpected stressful events , 2016, Comput. Hum. Behav..

[3]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[4]  Nabeel Mohammed,et al.  Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep Recurrent models , 2016, ArXiv.

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[7]  Wasifa Chowdhury,et al.  Performing sentiment analysis in Bangla microblog posts , 2014, 2014 International Conference on Informatics, Electronics & Vision (ICIEV).

[8]  Md. Al-Amin,et al.  Sentiment analysis of Bengali comments with Word2Vec and sentiment information of words , 2017, 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE).

[9]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[10]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[11]  Vishal Gupta,et al.  A Survey on Sentiment Analysis and Opinion Mining Techniques , 2013 .

[12]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..