Sentimental Analysis of Twitter Data using Text Mining and Hybrid Classification Approach

Opinion Mining is an important concept in today’s world and due to the advent of social media it has become a huge source of database. Since almost everybody in the modern era is involved with some social media platform, the public mood is hugely reflected in the social media today. This thesis proposes to utilize this source of information and predict the sentiments of public towards a particular topic. Food price crisis is being studied here in this thesis and public opinion is predicted for the topic. Twitter data is utilized for the same and live tweets of Indian origin are extracted using twitter API called ‘tweepy’. Oauth is used as handler and tweets are filtered for specific keywords and location using latitude longitude data. The tweets are saved into a database. They first preprocessed for removal of spam, special characters, url, short words etc.The tweets are then stemmed and tokenized and TF-IDF score is calculated for all the keywords. Feature selection is applied on it using Chi-Square and information gain. A term document matrix (TDM) is created which is fed to the classifiers for classification. Two classifiers has been analysed in this thesis: KNN and Naive Baye’s and a hybrid has been made using them. The results of both the classifier has ben found to be satisfactory while the hybrid-KNN outperforms the Naive Baye’s Classifiers in terms of accuracy. Thus a novel method is designed for opinion mining of Indian tweets regarding food price crisis.