Learning the graph of epidemic cascades

We consider the problem of finding the graph on which an epidemic spreads, given only the times when each node gets infected. While this is a problem of central importance in several contexts -- offline and online social networks, e-commerce, epidemiology -- there has been very little work, analytical or empirical, on finding the graph. Clearly, it is impossible to do so from just one epidemic; our interest is in learning the graph from a small number of independent epidemics. For the classic and popular "independent cascade" epidemics, we analytically establish sufficient conditions on the number of epidemics for both the global maximum-likelihood (ML) estimator, and a natural greedy algorithm to succeed with high probability. Both results are based on a key observation: the global graph learning problem decouples into n local problems -- one for each node. For a node of degree d, we show that its neighborhood can be reliably found once it has been infected O(d2 log n) times (for ML on general graphs) or O(d log n) times (for greedy on trees). We also provide a corresponding information-theoretic lower bound of Ω(d log n); thus our bounds are essentially tight. Furthermore, if we are given side-information in the form of a super-graph of the actual graph (as is often the case), then the number of epidemic samples required -- in all cases -- becomes independent of the network size n.

[1]  Laurent Massoulié,et al.  Epidemic live streaming: optimal performance trade-offs , 2008, SIGMETRICS '08.

[2]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[3]  M. L. Sachtjen,et al.  Disturbances in a power transmission system , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[4]  Sujay Sanghavi,et al.  Finding the Graph of Epidemic Cascades , 2012, ArXiv.

[5]  Laurent Massoulié,et al.  Rate-optimal schemes for Peer-to-Peer live streaming , 2008, Perform. Evaluation.

[6]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[7]  Jure Leskovec,et al.  On the Convexity of Latent Social Network Inference , 2010, NIPS.

[8]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[9]  Vwani P. Roychowdhury,et al.  Information resonance on Twitter: watching Iran , 2010, SOMA '10.

[10]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[11]  Jeffrey O. Kephart,et al.  Directed-graph epidemiological models of computer viruses , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[12]  J. E. Groves,et al.  Made in America: Science, Technology and American Modernist Poets , 1989 .

[13]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[14]  C. Perrow Normal Accidents: Living with High Risk Technologies - Updated Edition , 2011 .

[15]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[16]  C. W. Taylor,et al.  Model validation for the August 10, 1996 WSCC system outage , 1999 .