Detecting Large Reshare Cascades in Social Networks

Detecting large reshare cascades is an important problem in online social networks. There are a variety of attempts to model this problem, from using time series analysis methods to stochastic processes. Most of these approaches heavily depend on the underlying network features and use network information to detect the virality of cascades. In most cases, however, getting such detailed network information can be hard or even impossible. In contrast, in this paper, we propose SANSNET, a network-agnostic approach instead. Our method can be used to answer two important questions: (1) Will a cascade go viral? and (2) How early can we predict it? We use techniques from survival analysis to build a supervised classifier in the space of survival probabilities and show that the optimal decision boundary is a survival function. A notable feature of our approach is that it does not use any network-based features for the prediction tasks, making it very cheap to implement. Finally, we evaluate our approach on several real-life data sets, including popular social networks like Facebook and Twitter, on metrics like recall, F-measure and breakout coverage. We find that network agnostic SANSNET classifier outperforms several non-trivial competitors and baselines which utilize network information.

[1]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[2]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[3]  Prem Melville,et al.  Supervised Rank Aggregation for Predicting Influencers in Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[4]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[5]  Mingxuan Sun,et al.  A hazard based approach to user return time prediction , 2014, KDD.

[6]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Christos Faloutsos,et al.  Rise and fall patterns of information diffusion: model and implications , 2012, KDD.

[8]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[9]  Dimitrios Gunopulos,et al.  Finding effectors in social networks , 2010, KDD.

[10]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[11]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[12]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[13]  Michalis Faloutsos,et al.  Threshold conditions for arbitrary cascade models on arbitrary networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14]  References , 1971 .

[15]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[16]  Madhav V. Marathe,et al.  EpiSimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks , 2008, HiPC 2008.

[17]  Zhoujun Li,et al.  Burst Time Prediction in Cascades , 2015, AAAI.

[18]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[19]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[20]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[21]  Tudor Dumitras,et al.  Spatio-temporal mining of software adoption & penetration , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[22]  Charu C. Aggarwal,et al.  Content-centric flow mining for influence analysis in social streams , 2013, CIKM.

[23]  Fei Wang,et al.  Cascading outbreak prediction in networks: a data-driven approach , 2013, KDD.

[24]  P. Grambsch,et al.  A Package for Survival Analysis in S , 1994 .

[25]  Rediet Abebe Can Cascades be Predicted? , 2014 .

[26]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.

[27]  Fei Wang,et al.  From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics , 2015, 2015 IEEE International Conference on Data Mining.

[28]  A. J. Hall Infectious diseases of humans: R. M. Anderson & R. M. May. Oxford etc.: Oxford University Press, 1991. viii + 757 pp. Price £50. ISBN 0-19-854599-1 , 1992 .

[29]  Christos Faloutsos,et al.  Fractional Immunization in Networks , 2013, SDM.

[30]  I. Langner Survival Analysis: Techniques for Censored and Truncated Data , 2006 .

[31]  Bernhard Schölkopf,et al.  Modeling Information Propagation with Survival Theory , 2013, ICML.

[32]  Madhav V. Marathe,et al.  EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  E. Rogers,et al.  Diffusion of Innovations, 5th Edition , 2003 .

[34]  Ambuj K. Singh,et al.  Beyond Models: Forecasting Complex Network Processes Directly from Data , 2015, WWW.

[35]  Scott Counts,et al.  Predicting the Speed, Scale, and Range of Information Diffusion in Twitter , 2010, ICWSM.

[36]  Suman Nath,et al.  ThermoCast: a cyber-physical forecasting model for datacenters , 2011, KDD.

[37]  Michalis Faloutsos,et al.  Gelling, and melting, large graphs by edge manipulation , 2012, CIKM.

[38]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[39]  Justin Cheng,et al.  Rumor Cascades , 2014, ICWSM.