Stable exponential random graph models with non-parametric components for large dense networks

Abstract Exponential random graph models (ERGM) behave peculiar in large networks with thousand(s) of actors (nodes). Standard models containing 2-star or triangle counts as statistics are often unstable leading to completely full or empty networks. Moreover, numerical methods break down which makes it complicated to apply ERGMs to large networks. In this paper we propose two strategies to circumvent these obstacles. First, we use a subsampling scheme to obtain (conditionally) independent observations for model fitting and secondly, we show how linear statistics (like 2-stars etc.) can be replaced by smooth functional components. These two steps in combination allow to fit stable models to large network data, which is illustrated by a data example including a residual analysis.

[1]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[4]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[5]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[6]  B. Bagchi,et al.  Latin squares , 2012 .

[7]  Alberto Caimo,et al.  Bayesian inference for exponential random graph models , 2010, Soc. Networks.

[8]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[9]  Göran Kauermann,et al.  Penalized spline smoothing in multivariable survival models with varying coefficients , 2005, Comput. Stat. Data Anal..

[10]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[11]  Ulrik Brandes,et al.  Analysis and Visualization of Social Networks , 2003, Graph Drawing Software.

[12]  David R. Hunter,et al.  Curved exponential family models for social networks , 2007, Soc. Networks.

[13]  F. Leisch,et al.  FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters , 2008 .

[14]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[15]  Tom A. B. Snijders,et al.  Exponential Random Graph Models for Social Networks , 2013 .

[16]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[17]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[18]  M. Schweinberger Instability, Sensitivity, and Degeneracy of Discrete Exponential Families , 2011, Journal of the American Statistical Association.

[19]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[20]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[21]  Stephen E. Fienberg,et al.  A Brief History of Statistical Models for Network Analysis and Open Challenges , 2012 .

[22]  Michael Jünger,et al.  Graph Drawing Software , 2003, Graph Drawing Software.

[23]  Pavel N Krivitsky,et al.  Computational Statistical Methods for Social Network Models , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[24]  David Ruppert,et al.  Semiparametric regression during 2003-2007. , 2009, Electronic journal of statistics.

[25]  Dorothea Wagner,et al.  Analysis and Visualization of Social Networks , 2003, Graph Drawing Software.

[26]  Karl Mosler,et al.  General notions of depth for functional data , 2012, 1208.1981.

[27]  R. Schall Estimation in generalized linear models with random effects , 1991 .

[28]  Johan Koskinen,et al.  Dependence Graphs and Sufficient statistics , 2013 .

[29]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[30]  D. Nychka,et al.  Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked? , 2012 .

[31]  A. Rinaldo,et al.  CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS. , 2011, Annals of statistics.

[32]  Mark S Handcock,et al.  Improving Simulation-Based Algorithms for Fitting ERGMs , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[33]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[34]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[35]  G. Kauermann,et al.  A Note on Penalized Spline Smoothing With Correlated Errors , 2007 .

[36]  Lars Døvling Andersen,et al.  Generalized Latin rectangles I: Construction and decomposition , 1980, Discret. Math..

[37]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[38]  Zoubin Ghahramani,et al.  MCMC for Doubly-intractable Distributions , 2006, UAI.

[39]  L. Fahrmeir,et al.  Some asymptotic results on generalized penalized spline smoothing , 2007 .

[40]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[41]  Mark Von Tress,et al.  Generalized, Linear, and Mixed Models , 2003, Technometrics.

[42]  J. Romo,et al.  On the Concept of Depth for Functional Data , 2009 .

[43]  Mark S Handcock,et al.  Local dependence in random graph models: characterization, properties and statistical inference , 2015, Journal of the American Statistical Association.