Scalable Social Analytics for Live Viral Event Prediction

Large-scale, predictive social analytics have proven effective. Over the last decade, research and industrial efforts have understood the potential value of inferences based on online behavior analysis, sentiment mining, influence analysis, epidemic spread, etc. The majority of these efforts, however, are not yet designed with realtime responsiveness as a first-order requirement. Typical systems perform a post-mortem analysis on volumes of historical data and validate their “predictions” against already-occurred events. We observe that in many applications, real-time predictions are critical and delays of hours (and even minutes) can reduce their utility. As examples: political campaigns could react very quickly to a scandal spreading on Facebook; content distribution networks (CDNs) could prefetch videos that are predicted to soon go viral; online advertisement campaigns can be corrected to enhance consumer reception. This paper proposes CrowdCast, a cloud-based framework to enable real-time analysis and prediction from streaming social data. As an instantiation of this framework, we tune CrowdCast to observe Twitter tweets, and predict which YouTube videos are most likely to “go viral” in the near future. To this end, CrowdCast first applies online machine learning to map natural language tweets to a specific YouTube video. Then, tweets that indeed refer to videos are weighted by the perceived “influence” of the sender. Finally, the video’s spread is predicted through a sociological model, derived from the emerging structure of the graph over which the video-related tweets are (still) spreading. Combining metrics of influence and live structure, CrowdCast outputs sets of candidate videos, identified as likely to become viral in the next few hours. We monitor Twitter for more than 30 days, and find that CrowdCast’s real-time predictions demonstrate encouraging correlation with actual YouTube viewership in the near future.

[1]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[2]  David A. Shamma,et al.  Viral Actions: Predicting Video View Counts Using Synchronous Sharing Behaviors , 2011, ICWSM.

[3]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[4]  Daniel G. Goldstein,et al.  The structure of online diffusion networks , 2012, EC '12.

[5]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[6]  Katherine L. Milkman,et al.  What Makes Online Content Viral? , 2012 .

[7]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[8]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[9]  Ratul Mahajan,et al.  The case for psychological computing , 2013, HotMobile '13.

[10]  Gözde Özbal,et al.  Exploring Text Virality in Social Networks , 2011, ICWSM.

[11]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[12]  Sachin Agarwal,et al.  Social networks as Internet barometers for optimizing content delivery networks , 2009, 2009 IEEE 3rd International Symposium on Advanced Networks and Telecommunication Systems (ANTS).

[13]  Ralf Herbrich,et al.  Predicting Information Spreading in Twitter , 2010 .

[14]  Kevin Wallsten,et al.  “Yes We Can”: How Online Viewership, Blog Discussion, Campaign Statements, and Mainstream Media Coverage Produced a Viral Video Phenomenon , 2010 .

[15]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[16]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[17]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[18]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[19]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[20]  Tom Broxton,et al.  Catching a Viral Video , 2010, ICDM Workshops.

[21]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[22]  Ee-Peng Lim,et al.  Virality and Susceptibility in Information Diffusions , 2012, ICWSM.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.