论文信息 - A Unified Generative Model for Characterizing Microblogs' Topics

A Unified Generative Model for Characterizing Microblogs' Topics

In this paper, we focus on the issue of characterizing microblogs' topics based on topic models. Different from dealing with traditional textual media (such as news documents), modeling microblogs has three challenges: 1) too much noise; 2) short text; and 3) content incompleteness. Previously, all these limitations have been investigated separately. Some work filters the noise through a prior classification; some enhances the text through the user's blog history; and some utilizes the social network. However, none of these work could solve all the above limitations simultaneously. To solve this problem, we make a combination of previous work in this paper, and propose a unified generative model for characterizing microblogs' topics. In the proposed unified approach, all the three limitations could be solved. A collapsed Gibbs-sampling optimization method is derived for estimating the parameters. Through both qualitative and quantitative analysis in Twitter, we demonstrate that our approach consistently outperforms previous methods at a significant scale.

[1] Gregor Heinrich. Parameter estimation for text analysis , 2009 .

[2] Jiawei Han,et al. The Joint Inference of Topic Diffusion and Evolution in Social Communities , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3] Susan T. Dumais,et al. Characterizing Microblogs with Topic Models , 2010, ICWSM.

[4] Ramesh Nallapati,et al. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[5] Zhang Chenyi,et al. Topic Mining for Microblog Based on MB-LDA Model , 2011 .

[6] Alexander J. Smola,et al. Discovering geographical topics in the twitter stream , 2012, WWW.

[7] Krishna P. Gummadi,et al. Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[8] Timothy W. Finin,et al. Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[9] Jure Leskovec,et al. Patterns of temporal variation in online media , 2011, WSDM '11.

[10] Isabell M. Welpe,et al. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[11] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[12] Hosung Park,et al. What is Twitter, a social network or a news media? , 2010, WWW '10.

[13] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..