EM-based smooth graphon estimation using MCMC and spline-based approaches

The paper proposes the estimation of a smooth graphon function for network data analysis using principles of the EM algorithm. The approach considers both, variability with respect to ordering the nodes of a network and smooth estimation of the graphon function by nonparametric regression. To do so, (linear) B-splines are used, which allow for smooth estimation of the graphon, conditional on the ordering of the nodes. This provides the M-step. The true ordering of the nodes resulting from the graphon model remains unobserved and Bayesian ideas are employed to obtain posterior samples, given the network data. This yields the E-step. Combining both steps gives an EM based approach for smooth graphon estimation. The proposed graphon estimate allows to explore both the degree distribution and the ordering of the nodes with respect to their connectivity behavior. Variability and uncertainty is taken into account using MCMC techniques. Examples and a simulation study support the applicability of the approach.

[1]  Jiaming Xu,et al.  Rates of Convergence of Spectral Methods for Graphon Estimation , 2017, ICML.

[2]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[3]  Daniel M. Roy,et al.  Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Cedric E. Ginestet,et al.  Cognitive relevance of the community structure of the human brain functional coactivation network , 2013, Proceedings of the National Academy of Sciences.

[5]  Patrick J. Wolfe,et al.  Co-clustering separately exchangeable network data , 2012, ArXiv.

[6]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[7]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[8]  Eric D. Kolaczyk,et al.  Topics at the Frontier of Statistics and Network Analysis: (Re)Visiting the Foundations , 2017 .

[9]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[10]  P. Diaconis,et al.  Graph limits and exchangeable random graphs , 2007, 0712.2749.

[11]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[12]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[13]  David Choi,et al.  Co-clustering of Nonsmooth Graphons , 2015, ArXiv.

[14]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[15]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[16]  Tian Zheng,et al.  GLMLE: graph-limit enabled fast computation for fitting exponential random graph models to large social networks , 2015, Social Network Analysis and Mining.

[17]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[18]  Thomas Brendan Murphy,et al.  Review of statistical network analysis: models, algorithms, and software , 2012, Stat. Anal. Data Min..

[19]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[20]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[21]  A. Tsybakov,et al.  Oracle inequalities for network models and sparse graphon estimation , 2015, 1507.04118.

[22]  Simon N. Wood,et al.  P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data , 2016, Statistics and Computing.

[23]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .

[24]  Klaus Nordhausen,et al.  Statistical Analysis of Network Data with R , 2015 .

[25]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[26]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[27]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[28]  David Ruppert,et al.  Semiparametric regression during 2003-2007. , 2009, Electronic journal of statistics.

[29]  Edoardo M. Airoldi,et al.  Nonparametric estimation and testing of exchangeable graph models , 2014, AISTATS.

[30]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[31]  D. Ruppert,et al.  Flexible Copula Density Estimation with Penalized Hierarchical B‐splines , 2013 .

[32]  V. Sós,et al.  Convergent Sequences of Dense Graphs I: Subgraph Frequencies, Metric Properties and Testing , 2007, math/0702004.

[33]  Tom A. B. Snijders,et al.  Exponential Random Graph Models for Social Networks , 2013 .

[34]  Harrison H. Zhou,et al.  A general framework for Bayes structured linear models , 2015, The Annals of Statistics.

[35]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[36]  Julien Brailly,et al.  Exponential Random Graph Models for Social Networks , 2014 .

[37]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[38]  E. Levina,et al.  Estimating network edge probabilities by neighborhood smoothing , 2015, 1509.08588.

[39]  Patrick J. Wolfe,et al.  Network histograms and universality of blockmodel approximation , 2013, Proceedings of the National Academy of Sciences.

[40]  Edoardo M. Airoldi,et al.  A Consistent Histogram Estimator for Exchangeable Graph Models , 2014, ICML.

[41]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[42]  Andrew G. Long,et al.  Alliance Treaty Obligations and Provisions, 1815-1944 , 2002 .

[43]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[44]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[45]  Chao Gao,et al.  Optimal Estimation and Completion of Matrices with Biclustering Structures , 2016, J. Mach. Learn. Res..

[46]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[47]  Stephen E. Fienberg,et al.  A Brief History of Statistical Models for Network Analysis and Open Challenges , 2012 .