Sentiwordnet for Bangla Sentiwordnet for Bangla

Advances in NLP techniques have led to a great demand for tagging and analysis of the sentiments from unstructured natural language data over the last few years. A typical approach to sentiment analysis is to start with a lexicon of positive and negative words and phrases. In these lexicons, entries are tagged with their prior out of context polarity. Unfortunately all efforts found in literature deal mostly with English texts. In this squib, we propose a computational technique of generating an equivalent SentiWordNet (Bengali) from publicly available English Sentiment lexicons and EnglishBengali bilingual dictionary. The target language for the present task is Bengali, though the methodology could be replicated for any new language. There are two main lexical resources widely used in English for Sentiment analysis: SentiWordNet (Esuli et. al., 2006) and Subjectivity Word List (Wilson et. al., 2005). SentiWordNet is an automatically constructed lexical resource for English which assigns a positivity score and a negativity score to each WordNet synset. The subjectivity lexicon was compiled from manually developed resources augmented with entries learned from corpora. The entries in the Subjectivity lexicon have been labelled for part of speech (POS) as well as either strong or weak subjective tag depending on reliability of the subjective nature of the entry.