Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model

Microblogs are short texts carried with limited information, which will increase the difficulty of topic mining. This paper proposes the use of PAM (Pachinko Allocation Model) probabilistic topic model to extract the generative model of text’s implicit theme for microblog hot spot mining. First, three categories of microblog and the main contribution of this paper are illustrated. Second, for there are four topic models which are respectively explained, the PAM model is introduced in detail in terms of how to generate a document, the accuracy of document classification and the topic correlation in PAM. Finally, MapReduce is described. For the number of microblogs is huge as well as the number of contactors, the totally number of words is relatively small. With MapReduce, microblogs data are split by contactor, document-topic count matrix and contactor-topic count matrix can be locally stored while the word-topic count matrix must be globally stored. Thus, the hot spot mining can be achieved on the basis of PAM probabilistic topic model.

[1]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[2]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[3]  Jianling Sun,et al.  Large scale microblog mining using distributed MB-LDA , 2012, WWW.

[4]  Mirella Lapata,et al.  Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6-7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL , 2009, EMNLP.

[5]  B. Schölkopf,et al.  Hierarchical Dirichlet Processes with Random Effects , 2007 .

[6]  Los Angeles,et al.  Probabilistic Topic Models for Graph Mining , 2014 .

[7]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[8]  Keke Shi,et al.  Mass and mass center identification of target satellite after rendezvous and docking , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[9]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[10]  C. Elkan,et al.  Topic Models , 2008 .

[11]  Anserd J. Foster,et al.  EFFECT OF TALL FESCUE ENDOPHYTE STRAINS AND NITROGEN FERTILIZATION ON SOIL AGGREGATE STABILITY , 2012 .

[12]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[13]  Yue Xiao-dong Research on PAM Probability Topic Model , 2013 .

[14]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[15]  Feng Wang,et al.  In-orbit estimation of inertia parameters of target satellite after capturing the tracking satellite , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[16]  Chuang Liu,et al.  Robust H∞ Control for Satellite Attitude Control System with Uncertainties and Additive Perturbation , 2014 .