Inferring Networks of Diffusion and Influence

Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

[1]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[2]  Lada A. Adamic,et al.  Tracking information epidemics in blogspace , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[3]  Kristina Lerman,et al.  What Stops Social Epidemics? , 2011, ICWSM.

[4]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[5]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[6]  Kristina Lerman,et al.  A framework for quantitative analysis of cascades on networks , 2010, WSDM '11.

[7]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[8]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[9]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[10]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[11]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[12]  Zoubin Ghahramani,et al.  A kernel method for unsupervised structured network inference , 2009, AISTATS.

[13]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[14]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[15]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[16]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[18]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[19]  N. Ling The Mathematical Theory of Infectious Diseases and its applications , 1978 .

[20]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[21]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[22]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[23]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[24]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[25]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[26]  E. Rogers Diffusion of Innovations, Fourth Edition , 1982 .

[27]  Jure Leskovec,et al.  Correcting for missing data in information cascades , 2011, WSDM '11.

[28]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[29]  Le Song,et al.  Time-Varying Dynamic Bayesian Networks , 2009, NIPS.

[30]  M. Kearns,et al.  An Experimental Study of the Coloring Problem on Human Subject Networks , 2006, Science.

[31]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[32]  J. Wallinga,et al.  Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures , 2004, American journal of epidemiology.

[33]  Jure Leskovec,et al.  Patterns of Influence in a Recommendation Network , 2006, PAKDD.

[34]  B. Bollobás The evolution of random graphs , 1984 .

[35]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[36]  P. Lazarsfeld,et al.  6. Katz, E. Personal Influence: The Part Played by People in the Flow of Mass Communications , 1956 .

[37]  Jon M. Kleinberg,et al.  Tracing information flow on a global scale using Internet chain-letter data , 2008, Proceedings of the National Academy of Sciences.

[38]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[39]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[40]  Jure Leskovec,et al.  On the Convexity of Latent Social Network Inference , 2010, NIPS.

[41]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[42]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[43]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[44]  W. T. Tutte The dissection of equilateral triangles into equilateral triangles , 1948, Mathematical Proceedings of the Cambridge Philosophical Society.

[45]  R. May,et al.  Infectious Diseases of Humans: Dynamics and Control , 1991, Annals of Internal Medicine.

[46]  Christos Faloutsos,et al.  Scalable modeling of real graphs using Kronecker multiplication , 2007, ICML '07.

[47]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[48]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, SKDD.

[49]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[50]  D. Strang,et al.  DIFFUSION IN ORGANIZATIONS AND SOCIAL MOVEMENTS: From Hybrid Corn to Poison Pills , 1998 .

[51]  Yoshihiro Yamanishi,et al.  Supervised Graph Inference , 2004, NIPS.

[52]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[53]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[54]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[55]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[56]  D. Watts,et al.  Influentials, Networks, and Public Opinion Formation , 2007 .

[57]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[58]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[59]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[60]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[61]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.