On predicting the popularity of newly emerging hashtags in Twitter

Because of Twitter’s popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naive bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

[1]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[2]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[3]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[4]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..

[5]  A. Bernardo,et al.  Huberman, Romero, and Wu, Fang. . Social Networks that Matter: Twitter Under the Microscope. , 2008 .

[6]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[7]  Evgeniy Gabrilovich,et al.  Predicting web searcher satisfaction with existing community-based answers , 2011, SIGIR.

[8]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[9]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[10]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[11]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[12]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[13]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[14]  Virgílio A. F. Almeida,et al.  From bias to opinion: a transfer-learning approach to real-time sentiment analysis , 2011, KDD.

[15]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[16]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[17]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[18]  Ari Rappoport,et al.  What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities , 2012, WSDM '12.

[19]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[20]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[21]  Hila Becker,et al.  Hip and trendy: Characterizing emerging trends on Twitter , 2011, J. Assoc. Inf. Sci. Technol..

[22]  Junghoo Cho,et al.  Topical semantics of twitter links , 2011, WSDM '11.

[23]  S. Utz,et al.  Is the medium the message? Perceptions of and reactions to crisis communication on twitter, blogs and traditional media , 2011 .

[24]  Gao Cong,et al.  Will this #hashtag be popular tomorrow? , 2012, SIGIR '12.

[25]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[26]  David Hawking,et al.  New-web search with microblog annotations , 2010, WWW '10.

[27]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[28]  Lei Yang,et al.  We know what @you #tag: does the dual role affect hashtag adoption? , 2012, WWW.

[29]  Jiuchang Wei,et al.  Estimating the diffusion models of crisis information in micro blog , 2012, J. Informetrics.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..