A Large number of digital text information is generated every day. Effectively searching, managing and exploring the text data has become a main task. In this paper, we first represent an introduction to text mining and a probabilistic topic model Latent Dirichlet allocation. Then two experiments are proposed - Wikipedia articles and users’ tweets topic modelling. The former one builds up a document topic model, aiming to a topic perspective solution on searching, exploring and recommending articles. The latter one sets up a user topic model, providing a full research and analysis over Twitter users’ interest. The experiment process including data collecting, data pre-processing and model training is fully documented and commented. Further more, the conclusion and application of this paper could be a useful computation tool for social and business research.
[1]
David M. Blei,et al.
Probabilistic topic models
,
2012,
Commun. ACM.
[2]
Martin Ponweiser,et al.
Latent Dirichlet Allocation in R
,
2012
.
[3]
John D. Lafferty,et al.
A correlated topic model of Science
,
2007,
0708.3601.
[4]
Jianhua Lin,et al.
Divergence measures based on the Shannon entropy
,
1991,
IEEE Trans. Inf. Theory.
[5]
Khairullah Khan,et al.
A Review of Machine Learning Algorithms for Text-Documents Classification
,
2010
.
[6]
Kurt Hornik,et al.
topicmodels : An R Package for Fitting Topic Models
,
2016
.