Topic Modelling and Sentiment Analysis with the Bangla Language: A Deep Learning Approach Combined with the Latent Dirichlet Allocation

In this thesis, the Bangla language topic modelling and sentiment analysis has been researched. It has two contributions lining up together. In this regard, we have proposed different models for both the topic modelling and the sentiment analysis task. Many research exist for both of these works but they do not address the Bangla language. Topic modelling is a powerful technique for unsupervised analysis of large document collections. There are various efficient topic modelling techniques available for the English language as it is one of the most spoken languages in the whole world, but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This thesis deals with finding the core topics of the Bangla news corpus and classifying news with a similarity measure which is one of the contributions. This is the first ever tool for Bangla topic modelling. The document models are built using LDA (Latent Dirichlet Allocation) with Bigram. Over the recent years, people in Bangladesh are heavily getting involved in social media with Bangla texts. Among this involvement, people post their opinion about products or businesses across different social sites and Facebook is the most weighted one. We have collected data from the Facebook Bangla comments and applied a state of the art algorithm to extract the sentiments which is another contribution. Our proposed system will demonstrate an efficient sentiment analysis. We have performed a comparison analysis with the existing sentiment analysis system in Bangla. However it is not straightforward to extract sentiments from the Bengali language due to its complex grammatical structure. A deep learning based method was applied to train the model and understand the underlying sentiment. The main idea is confined to the word level and character level encoding and in order to see the differences in terms of the model performance. So, we will explore different algorithms and techniques for topic modelling and sentiment analysis for the Bangla language.