Back to the Past: Source Identification in Diffusion Networks from Partially Observed Cascades

When a piece of malicious information becomes rampant in an information diffusion network, can we identify the source node that originally introduced the piece into the network and infer the time when it initiated this? Being able to do so is critical for curtailing the spread of malicious information, and reducing the potential losses incurred. This is a very challenging problem since typically only incomplete traces are observed and we need to unroll the incomplete traces into the past in order to pinpoint the source. In this paper, we tackle this problem by developing a two-stage framework, which first learns a continuous-time diffusion network model based on historical diffusion traces and then identifies the source of an incomplete diffusion trace by maximizing the likelihood of the trace under the learned model. Experiments on both large synthetic and real-world data show that our framework can effectively go back to the past, and pinpoint the source node and its initiation time significantly more accurately than previous state-of-the-arts.

[1]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[2]  David G. Luenberger,et al.  Introduction to Linear and Nonlinear Programming , 1973 .

[3]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[4]  Alexander Grey,et al.  The Mathematical Theory of Infectious Diseases and Its Applications , 1977 .

[5]  N. Ling The Mathematical Theory of Infectious Diseases and its applications , 1978 .

[6]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[7]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[8]  Dimitrios Gunopulos,et al.  Finding effectors in social networks , 2010, KDD.

[9]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[10]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[11]  Devavrat Shah,et al.  Detecting sources of computer viruses in networks: theory and experiment , 2010, SIGMETRICS '10.

[12]  Jon M. Kleinberg,et al.  Reconstructing Patterns of Information Diffusion from Incomplete Observations , 2011, NIPS.

[13]  Jure Leskovec,et al.  Correcting for missing data in information cascades , 2011, WSDM '11.

[14]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[15]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[16]  Manuel Gomez Rodriguez,et al.  Influence Maximization in Continuous Time Diffusion Networks , 2012, ICML 2012.

[17]  Martin Vetterli,et al.  Locating the Source of Diffusion in Large-Scale Networks , 2012, Physical review letters.

[18]  Bernhard Scholkopf,et al.  Submodular Inference of Diffusion Networks from Multiple Trees , 2012, ICML.

[19]  Christos Faloutsos,et al.  Spotting Culprits in Epidemics: How Many and Which Ones? , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Le Song,et al.  Learning Networks of Heterogeneous Influence , 2012, NIPS.

[21]  Bernhard Schölkopf,et al.  Influence Maximization in Continuous Time Diffusion Networks , 2012, ICML.

[22]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[23]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[24]  Bernhard Schölkopf,et al.  Structure and dynamics of information pathways in online media , 2012, WSDM.

[25]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[26]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[27]  Le Song,et al.  Scalable Influence Estimation in Continuous-Time Diffusion Networks , 2013, NIPS.

[28]  Le Song,et al.  Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes , 2013, AISTATS.

[29]  Le Song,et al.  Uncover Topic-Sensitive Information Diffusion Networks , 2013, AISTATS.

[30]  Le Song,et al.  Learning Triggering Kernels for Multi-dimensional Hawkes Processes , 2013, ICML.

[31]  Le Song,et al.  Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm , 2014, ICML.

[32]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..