论文信息 - Performing sentiment analysis in Bangla microblog posts

Performing sentiment analysis in Bangla microblog posts

Much of the research work on sentiment analysis has been carried out in the English language, but work in Bangla is limited to only news corpus and blogs. Microblogging sites are becoming a valuable source for publishing huge volumes of user-generated information, as users express their views, opinions, and sentiments over various topics. In this paper, we aim to automatically extract the sentiments or opinions conveyed by users from Bangla microblog posts and then identify the overall polarity of texts as either negative or positive. We use a semi-supervised bootstrapping approach for the development of the training corpus which avoids the need for labor intensive manual annotation. For classification, we use Support Vector Machine (SVM) and Maximum Entropy (MaxEnt) and do a comparative analysis on the performance of these two machine learning algorithms by experimenting with a combination of various sets of features.

Wasifa Chowdhury | Shaika Chowdhury | Shaika Chowdhury | Wasifa Chowdhury

[1] Vincent Ng,et al. Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification , 2009, ACL.

[2] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3] Ari Rappoport,et al. Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[4] Diana Maynard,et al. Automatic Detection of Political Opinions in Tweets , 2011, #MSM.

[5] Rada Mihalcea,et al. Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[6] Alan F. Smeaton,et al. Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[7] Sivaji Bandyopadhyay,et al. Design of a Rule-based Stemmer for Natural Language Text in Bengali , 2008, IJCNLP.

[8] Dipankar Das,et al. Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level , 2010 .

[9] Junlan Feng,et al. Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.