Modeling a Retweet Network via an Adaptive Bayesian Approach

Twitter (and similar microblogging services) has become a central nexus for discussion of the topics of the day. Twitter data contains rich content and structured information on users' topics of interest and behavior patterns. Correctly analyzing and modeling Twitter data enables the prediction of the user behavior and preference in a variety of practical applications, such as tweet recommendation and followee recommendation. Although a number of models have been developed on Twitter data in prior work, most of these only model the tweets from users, while neglecting their valuable retweet information in the data. Models would enhance their predictive power by incorporating users' retweet content as well as their retweet behavior. In this paper, we propose two novel Bayesian nonparametric models, URM and UCM, on retweet data. Both of them are able to integrate the analysis of tweet text and users' retweet behavior in the same probabilistic framework. Moreover, they both jointly model users' interest in tweet and retweet. As nonparametric models, URM and UCM can automatically determine the parameters of the models based on input data, avoiding arbitrary parameter settings. Extensive experiments on real-world Twitter data show that both URM and UCM are superior to all the baselines, while UCM further outperforms URM, confirming the appropriateness of our models in retweet modeling.

[1]  Yue Liu,et al.  Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph , 2014, ACM Trans. Internet Techn..

[2]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[3]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .

[4]  Elena Baralis,et al.  Analysis of Twitter Data Using a Multiple-level Clustering Strategy , 2013, MEDI.

[5]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[6]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[7]  Wray L. Buntine,et al.  Twitter-Network Topic Model: A Full Bayesian Treatment for Social Network and Text Modeling , 2016, ArXiv.

[8]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[9]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[10]  Yee Whye Teh,et al.  Bayesian Nonparametric Models , 2010, Encyclopedia of Machine Learning.

[11]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[12]  Michael Gamon,et al.  Predicting Responses to Microblog Posts , 2012, NAACL.

[13]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[14]  Xu-Ying Liu,et al.  Crest: Cluster-based Representation Enrichment for Short Text Classification , 2013, PAKDD.

[15]  Brian D. Davison,et al.  Predicting popular messages in Twitter , 2011, WWW.

[16]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[17]  Virgílio A. F. Almeida,et al.  Understanding factors that affect response rates in twitter , 2012, HT '12.

[18]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[19]  Michael I. Jordan,et al.  Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications , 2010 .

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Yannis Sismanis,et al.  Scalable topic-specific influence analysis on microblogs , 2014, WSDM.

[22]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[23]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2011 .

[24]  Junghoo Cho,et al.  Topical semantics of twitter links , 2011, WSDM '11.

[25]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[26]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[27]  Scott Counts,et al.  Predicting the Speed, Scale, and Range of Information Diffusion in Twitter , 2010, ICWSM.

[28]  Juan-Zi Li,et al.  Understanding retweeting behaviors in social networks , 2010, CIKM.

[29]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[30]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[31]  Michael I. Jordan,et al.  Hierarchical Bayesian Nonparametric Models with Applications , 2008 .

[32]  Marc Cheong,et al.  Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base , 2009, CIKM-SWSM.

[33]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[34]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[35]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[36]  Ralf Herbrich,et al.  Predicting Information Spreading in Twitter , 2010 .

[37]  Ian Porteous,et al.  Networks of mixture blocks for non parametric bayesian models with applications , 2010 .