Nonparametric Link Prediction in Large Scale Dynamic Networks

We propose a nonparametric approach to link prediction in large-scale dynamic networks. Our model uses graph-based features of pairs of nodes as well as those of their local neighborhoods to predict whether those nodes will be linked at each time step. The model allows for different types of evolution in different parts of the graph (e.g, growing or shrinking communities). We focus on large-scale graphs and present an implementation of our model that makes use of locality-sensitive hashing to allow it to be scaled to large problems. Experiments with simulated data as well as five real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or nonlinearities are present. We also establish theoretical properties of our estimator, in particular consistency and weak convergence, the latter making use of an elaboration of Stein's method for dependency graphs.

[1]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[2]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[3]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[4]  Samuel Leinhardt,et al.  A dynamic model for social networks , 1977 .

[5]  J. V. Ryzin,et al.  A class of smooth estimators for discrete distributions , 1981 .

[6]  G. Grimmett,et al.  Probability and random processes , 2002 .

[7]  P. Tuan The mixing property of bilinear and generalised random coefficient autoregressive models , 1986 .

[8]  R. Durrett Probability: Theory and Examples , 1993 .

[9]  Joseph P. Romano,et al.  Nonparametric Resampling for Homogeneous Strong Mixing Random Fields , 1993 .

[10]  D. Tjøstheim,et al.  Nonparametric Estimation and Identification of Nonlinear ARCH Time Series Strong Convergence and Asymptotic Normality: Strong Convergence and Asymptotic Normality , 1995, Econometric Theory.

[11]  Y. Rinott,et al.  A Multivariate CLT for Local Dependence withn -1/2 log nRate and Applications to Multivariate Graph Related Statistics , 1996 .

[12]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  D. Politis,et al.  The local bootstrap for Markov processes , 2002 .

[15]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[16]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[17]  Robert Krauthgamer,et al.  The intrinsic dimensionality of graphs , 2003, STOC '03.

[18]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .

[19]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[20]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[21]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[22]  Eric P. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[23]  P. Bickel,et al.  Texture synthesis and nonparametric resampling of random fields , 2006, math/0611258.

[24]  Christos Faloutsos,et al.  Visualization of large networks with min-cut plots, A-plots and R-MAT , 2007, Int. J. Hum. Comput. Stud..

[25]  J. Sunklodas On Normal Approximation for Strongly Mixing Random Variables , 2007 .

[26]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[27]  A. Hordijk,et al.  SERIES EXPANSIONS FOR FINITE-STATE MARKOV CHAINS , 2007, Probability in the Engineering and Informational Sciences.

[28]  Purnamrita Sarkar,et al.  Dynamic Network Model for Predicting Occurrences of Salmonella at Food Facilities , 2008, BioSecure.

[29]  Le Song,et al.  Estimating time-varying networks , 2008, ISMB 2008.

[30]  Srikanta J. Bedathur,et al.  Towards time-aware link prediction in evolving social networks , 2009, SNA-KDD '09.

[31]  Zan Huang,et al.  The Time-Series Link Prediction Problem with Applications in Communication Surveillance , 2009, INFORMS J. Comput..

[32]  Larry A. Wasserman,et al.  Time varying undirected graphs , 2008, Machine Learning.

[33]  E. Xing,et al.  A state-space mixed membership blockmodel for dynamic network tomography , 2008, 0901.0135.

[34]  Padhraic Smyth,et al.  Continuous-Time Regression Models for Longitudinal Networks , 2011, NIPS.

[35]  Peter D. Hoff,et al.  Hierarchical multilinear models for multiway data , 2010, Comput. Stat. Data Anal..

[36]  Stéphane Gaïffas,et al.  Link prediction in graphs with autoregressive features , 2012, J. Mach. Learn. Res..