Scalable Approximate Bayesian Computation for Growing Network Models via Extrapolated and Sampled Summaries.

Approximate Bayesian computation (ABC) is a simulation-based likelihood-free method applicable to both model selection and parameter estimation. ABC parameter estimation requires the ability to forward simulate datasets from a candidate model, but because the sizes of the observed and simulated datasets usually need to match, this can be computationally expensive. Additionally, since ABC inference is based on comparisons of summary statistics computed on the observed and simulated data, using computationally expensive summary statistics can lead to further losses in efficiency. ABC has recently been applied to the family of mechanistic network models, an area that has traditionally lacked tools for inference and model choice. Mechanistic models of network growth repeatedly add nodes to a network until it reaches the size of the observed network, which may be of the order of millions of nodes. With ABC, this process can quickly become computationally prohibitive due to the resource intensive nature of network simulations and evaluation of summary statistics. We propose two methodological developments to enable the use of ABC for inference in models for large growing networks. First, to save time needed for forward simulating model realizations, we propose a procedure to extrapolate (via both least squares and Gaussian processes) summary statistics from small to large networks. Second, to reduce computation time for evaluating summary statistics, we use sample-based rather than census-based summary statistics. We show that the ABC posterior obtained through this approach, which adds two additional layers of approximation to the standard ABC, is similar to a classic ABC posterior. Although we deal with growing network models, both extrapolated summaries and sampled summaries are expected to be relevant in other ABC settings where the data are generated incrementally.

[1]  U. Brandes,et al.  Maximizing Modularity is hard , 2006, physics/0608255.

[2]  B. Bollobás The evolution of random graphs , 1984 .

[3]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[4]  Michael U. Gutmann,et al.  Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models , 2015, J. Mach. Learn. Res..

[5]  Ravi Goyal,et al.  Framework for Converting Mechanistic Network Models to Probabilistic Models , 2020, 2001.08521.

[6]  M. Gutmann,et al.  Fundamentals and Recent Developments in Approximate Bayesian Computation , 2016, Systematic biology.

[7]  Christopher C. Drovandi,et al.  Pre-processing for approximate Bayesian computation in image analysis , 2015, Stat. Comput..

[8]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[9]  V A Traag,et al.  Narrow scope for resolution-limit-free community detection. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[11]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[12]  Jukka-Pekka Onnela,et al.  A Bootstrap Method for Goodness of Fit and Model Selection with a Single Observed Network , 2018, Scientific Reports.

[13]  Dennis Prangle,et al.  Adapting the ABC distance function , 2015, 1507.00874.

[14]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[15]  Matthew E. Brashears,et al.  Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications , 2014 .

[16]  Ritabrata Dutta,et al.  Bayesian inference of spreading processes on networks , 2017, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[18]  Nathan Linial,et al.  Generative probabilistic models for protein–protein interaction networks—the biclique perspective , 2011, Bioinform..

[19]  Ricard V. Solé,et al.  A Model of Large-Scale proteome Evolution , 2002, Adv. Complex Syst..

[20]  Juan Pablo Carbajal,et al.  Appraisal of data-driven and mechanistic emulators of nonlinear simulators: The case of hydrodynamic urban drainage models , 2016, Environ. Model. Softw..

[21]  V. Eguíluz,et al.  Highly clustered scale-free networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Jari Saramäki,et al.  Emergence of communities in weighted networks. , 2007, Physical review letters.

[23]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[24]  A. O'Hagan,et al.  Bayesian emulation of complex multi-output and dynamic computer models , 2010 .

[25]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[26]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[27]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[28]  Guodong Zhang,et al.  Differentiable Compositional Kernel Learning for Gaussian Processes , 2018, ICML.

[29]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[30]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[31]  I. Ispolatov,et al.  Duplication-divergence model of protein interaction network. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Jukka-Pekka Onnela,et al.  ABCpy: A User-Friendly, Extensible, and Parallel Library for Approximate Bayesian Computation , 2017, PASC.

[33]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[34]  M. Gutmann,et al.  Approximate Bayesian Computation , 2019, Annual Review of Statistics and Its Application.

[35]  Jukka-Pekka Onnela,et al.  Flexible model selection for mechanistic network models , 2018, J. Complex Networks.

[36]  Richard Wilkinson,et al.  Accelerating ABC methods using Gaussian processes , 2014, AISTATS.

[37]  Jukka-Pekka Onnela,et al.  Feature-Based Classification of Networks , 2016, ArXiv.

[38]  Süleyman Cenk Sahinalp,et al.  Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution , 2006, Systems Biology and Computational Proteomics.