论文信息 - Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields. Keywords—Text mining, Twitter, topic model, sentiment analysis.

Haiyi Zhang | Sidi Yang | Haiyi Zhang | Sidi Yang

[1] C. Elkan,et al. Topic Models , 2008 .

[2] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.

[3] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4] Goutam Chakraborty,et al. Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining , 2014 .

[5] Martin Ponweiser,et al. Latent Dirichlet Allocation in R , 2012 .

[6] Saif Mohammad,et al. CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..