A Model-Free Approach to Infer the Diffusion Network from Event Cascade

Information diffusion through various types of networks, such as social networks and media networks, is a very common phenomenon on the Internet nowadays. In many scenarios, we can track only the time when the information reaches a node. However, the source infecting this node is usually unobserved. Inferring the underlying diffusion network based on cascade data (observed sequence of infected nodes with timestamp) without additional information is an essential and challenging task in information diffusion. Many studies have focused on constructing complex models to infer the underlying diffusion network in a parametric way. However, the diffusion process in the real world is very complex and hard to be captured by a parametric model. Even worse, inferring the parameters of a complex model is impractical under a large data volume. Different from previous works focusing on building models, we propose to interpret the diffusion process from the cascade data directly in a non-parametric way, and design a novel and efficient algorithm named Non-Parametric Distributional Clustering (NPDC). Our algorithm infers the diffusion network according to the statistical difference of the infection time intervals between nodes connected with diffusion edges versus those with no diffusion edges. NPDC is a model-free approach since we do not define any transmission models between nodes in advance. We conduct experiments on synthetic data sets and two large real-world data sets with millions of cascades. Our algorithm achieves substantially higher accuracy of network inference and is orders of magnitude faster compared with the state-of-the-art solutions.

[1]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[2]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[3]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[4]  Le Song,et al.  Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm , 2014, ICML.

[5]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[6]  Bernhard Schölkopf,et al.  Modeling Information Propagation with Survival Theory , 2013, ICML.

[7]  Tomoharu Iwata,et al.  Discovering latent influence in online social activities via shared cascade poisson processes , 2013, KDD.

[8]  Jiawei Han,et al.  Mining topic-level influence in heterogeneous networks , 2010, CIKM.

[9]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[10]  Zhoujun Li,et al.  Diabetes-Associated Factors as Predictors of Nursing Home Admission and Costs in the Elderly Across Europe. , 2017, Journal of the American Medical Directors Association.

[11]  Masahiro Kimura,et al.  Prediction of Information Diffusion Probabilities for Independent Cascade Model , 2008, KES.

[12]  Nello Cristianini,et al.  Refining causality: who copied from whom? , 2011, KDD.

[13]  Jon M. Kleinberg,et al.  Tracing information flow on a global scale using Internet chain-letter data , 2008, Proceedings of the National Academy of Sciences.

[14]  Masahiro Kimura,et al.  Generative Models of Information Diffusion with Asynchronous Timedelay , 2010, ACML.

[15]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[16]  Le Song,et al.  Uncover Topic-Sensitive Information Diffusion Networks , 2013, AISTATS.

[17]  Alessandro Panconesi,et al.  Trace complexity of network inference , 2013, KDD.

[18]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.

[19]  Jure Leskovec,et al.  On the Convexity of Latent Social Network Inference , 2010, NIPS.

[20]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[21]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[22]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[23]  Christos Faloutsos,et al.  Scalable modeling of real graphs using Kronecker multiplication , 2007, ICML '07.

[24]  Stefano Ermon,et al.  Feature-Enhanced Probabilistic Models for Diffusion Network Inference , 2012, ECML/PKDD.

[25]  J. Wallinga,et al.  Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures , 2004, American journal of epidemiology.

[26]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[27]  Jure Leskovec,et al.  Information diffusion and external influence in networks , 2012, KDD.

[28]  Lada A. Adamic,et al.  Tracking information epidemics in blogspace , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[29]  Le Song,et al.  Learning Networks of Heterogeneous Influence , 2012, NIPS.

[30]  Bernhard Schölkopf,et al.  Structure and dynamics of information pathways in online media , 2012, WSDM.