Performing sentiment analysis in Bangla microblog posts

Much of the research work on sentiment analysis has been carried out in the English language, but work in Bangla is limited to only news corpus and blogs. Microblogging sites are becoming a valuable source for publishing huge volumes of user-generated information, as users express their views, opinions, and sentiments over various topics. In this paper, we aim to automatically extract the sentiments or opinions conveyed by users from Bangla microblog posts and then identify the overall polarity of texts as either negative or positive. We use a semi-supervised bootstrapping approach for the development of the training corpus which avoids the need for labor intensive manual annotation. For classification, we use Support Vector Machine (SVM) and Maximum Entropy (MaxEnt) and do a comparative analysis on the performance of these two machine learning algorithms by experimenting with a combination of various sets of features.