Statistical Inference on Random Dot Product Graphs: a Survey

The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.

[1]  Don H. Johnson,et al.  On the asymptotics of M-hypothesis Bayesian detection , 1997, IEEE Trans. Inf. Theory.

[2]  Rex E. Jung,et al.  MIGRAINE: MRI Graph Reliability Analysis and Inference for Connectomics , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[3]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[4]  Carey E. Priebe,et al.  Empirical Bayes Estimation for the Stochastic Blockmodel , 2014, 1405.6070.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Béla Bollobás,et al.  Random Graphs , 1985 .

[7]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[8]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[9]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[10]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[11]  Carey E. Priebe,et al.  On the Incommensurability Phenomenon , 2016, J. Classif..

[12]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[13]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[14]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[15]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[16]  János Komlós,et al.  The eigenvalues of random symmetric matrices , 1981, Comb..

[17]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[18]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[19]  Carey E. Priebe,et al.  A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs , 2011, 1108.2228.

[20]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[21]  B. Bollobás The evolution of random graphs , 1984 .

[22]  Carey E. Priebe,et al.  Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs , 2012, 1207.6745.

[23]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[24]  E. Arias-Castro,et al.  Community detection in dense random networks , 2014 .

[25]  Guillermo Sapiro,et al.  Graph Matching: Relax at Your Own Risk , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[27]  Can M. Le,et al.  Concentration and regularization of random graphs , 2015, Random Struct. Algorithms.

[28]  Kai Lai Chung,et al.  A Course in Probability Theory , 1949 .

[29]  Stephan Saalfeld,et al.  Quantitative neuroanatomy for connectomics in Drosophila , 2015 .

[30]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[31]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[33]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[34]  Vince Lyzinski,et al.  A joint graph inference case study: the C. elegans chemical and electrical connectomes , 2015, Worm.

[35]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[37]  Glen Coppersmith,et al.  Vertex nomination , 2014 .

[38]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[39]  Kristin Branson,et al.  A multilevel multimodal circuit enhances action selection in Drosophila , 2015, Nature.

[40]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[41]  Carey E. Priebe,et al.  Statistical Inference on Errorfully Observed Graphs , 2012, 1211.3601.

[42]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[43]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[44]  Yoshiyuki Kabashima,et al.  Limitations in the spectral method for graph partitioning: detectability threshold and localization of eigenvectors , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[46]  C. E. Priebe,et al.  Vertex nomination schemes for membership prediction , 2013, 1312.2638.

[47]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[48]  H. Chernoff LARGE-SAMPLE THEORY: PARAMETRIC CASE' , 1956 .

[49]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[50]  Edward R. Scheinerman,et al.  Modeling graphs using dot product representations , 2010, Comput. Stat..

[51]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[52]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[53]  C. Priebe,et al.  Perfect Clustering for Stochastic Blockmodel Graphs via Adjacency Spectral Embedding , 2013, 1310.0532.

[54]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[55]  C. Priebe,et al.  Semiparametric spectral modeling of the Drosophila connectome , 2017, 1705.03297.

[56]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[57]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[58]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[59]  Terence Tao,et al.  Random matrices: Universal properties of eigenvectors , 2011, 1103.2801.

[60]  Linyuan Lu,et al.  Spectra of Edge-Independent Random Graphs , 2012, Electron. J. Comb..

[61]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[62]  P. Bickel,et al.  Role of normalization in spectral clustering for stochastic blockmodels , 2013, 1310.1495.

[63]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[64]  R. Oliveira Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges , 2009, 0911.0600.

[65]  Carey E. Priebe,et al.  Consistent Adjacency-Spectral Partitioning for the Stochastic Block Model When the Model Parameters Are Unknown , 2012, SIAM J. Matrix Anal. Appl..

[66]  C. Priebe,et al.  A central limit theorem for an omnibus embedding of random dot product graphs , 2017, 1705.09355.

[67]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[68]  Carey E. Priebe,et al.  The generalised random dot product graph , 2017 .

[69]  R. Bartoszynski,et al.  Reducing multidimensional two-sample data to one-dimensional interpoint comparisons , 1996 .

[70]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[71]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[72]  Mu Zhu,et al.  Automatic dimensionality selection from the scree plot via the use of profile likelihood , 2006, Comput. Stat. Data Anal..

[73]  C. Priebe,et al.  The Kato-Temple inequality and eigenvalue concentration , 2016 .

[74]  Vince Lyzinski,et al.  Laplacian Eigenmaps From Sparse, Noisy Similarity Measurements , 2016, IEEE Transactions on Signal Processing.

[75]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[76]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[77]  Tosio Kato On the Upper and Lower Bounds of Eigenvalues , 1949 .

[78]  C. Priebe,et al.  A Limit Theorem for Scaled Eigenvectors of Random Dot Product Graphs , 2013, Sankhya A.

[79]  Edward R. Scheinerman,et al.  Random Dot Product Graph Models for Social Networks , 2007, WAW.

[80]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[81]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[82]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[83]  H. Akaike A new look at the statistical model identification , 1974 .

[84]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[85]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[86]  B. Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007 .

[87]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[88]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[89]  Carey E. Priebe,et al.  On the Consistency of the Likelihood Maximization Vertex Nomination Scheme: Bridging the Gap Between Maximum Likelihood Estimation and Graph Matching , 2016, J. Mach. Learn. Res..

[90]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[91]  C. Priebe,et al.  Universally consistent vertex classification for latent positions graphs , 2012, 1212.1182.

[92]  Carey E. Priebe,et al.  Community Detection and Classification in Hierarchical Stochastic Blockmodels , 2015, IEEE Transactions on Network Science and Engineering.

[93]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[94]  C. Priebe,et al.  A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs , 2017 .

[95]  C. Priebe,et al.  Vertex nomination via attributed random dot product graphs , 2011 .

[96]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[97]  Min Chen,et al.  Multi-parametric neuroimaging reproducibility: A 3-T resource study , 2011, NeuroImage.

[98]  Carey E. Priebe,et al.  Limit theorems for eigenvectors of the normalized Laplacian for random graphs , 2016, The Annals of Statistics.

[99]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[100]  Jing Lei A goodness-of-fit test for stochastic block models , 2014, 1412.4857.

[101]  C. Priebe,et al.  The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics , 2017, The Annals of Statistics.