Testing for Global Network Structure Using Small Subgraph Statistics

We study the problem of testing for community structure in networks using relations between the observed frequencies of small subgraphs. We propose a simple test for the existence of communities based only on the frequencies of three-node subgraphs. The test statistic is shown to be asymptotically normal under a null assumption of no community structure, and to have power approaching one under a composite alternative hypothesis of a degree-corrected stochastic block model. We also derive a version of the test that applies to multivariate Gaussian data. Our approach achieves near-optimal detection rates for the presence of community structure, in regimes where the signal-to-noise is too weak to explicitly estimate the communities themselves, using existing computationally efficient algorithms. We demonstrate how the method can be effective for detecting structure in social networks, citation networks for scientific articles, and correlations of stock returns between companies on the S\&P 500.

[1]  Remco van der Hofstad,et al.  Random Graphs and Complex Networks , 2016, Cambridge Series in Statistical and Probabilistic Mathematics.

[2]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[3]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[4]  G. C. Wick The Evaluation of the Collision Matrix , 1950 .

[5]  Debapratim Banerjee Contiguity and non-reconstruction results for planted partition models: the dense case , 2016, 1609.02854.

[6]  Linyuan Lu,et al.  Random graphs with given expected degrees , 2006 .

[7]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[8]  Jiashun Jin,et al.  Coauthorship and Citation Networks for Statisticians , 2014, ArXiv.

[9]  E. Candès,et al.  Detection of an anomalous cluster in a network , 2010, 1001.3209.

[10]  Anirban Dasgupta,et al.  Spectral analysis of random graphs with skewed degree distributions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[11]  E. Arias-Castro,et al.  Community detection in dense random networks , 2014 .

[12]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[13]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[14]  F. J. Anscombe,et al.  THE TRANSFORMATION OF POISSON, BINOMIAL AND NEGATIVE-BINOMIAL DATA , 1948 .

[15]  James G. Scott,et al.  The DFS Fused Lasso: Linear-Time Denoising over General Graphs , 2016, J. Mach. Learn. Res..

[16]  Arun Kadavankandy,et al.  Spectral analysis of random graphs with application to clustering and sampling. (L'analyse spectrale des graphes aléatoires et son application au groupement et l'échantillonnage) , 2017 .

[17]  Béla Bollobás,et al.  Random Graphs , 1985 .

[18]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[19]  Carey E. Priebe,et al.  Statistical inference for network samples using subgraph counts , 2017, ArXiv.

[20]  László Lovász,et al.  Large Networks and Graph Limits , 2012, Colloquium Publications.

[21]  Chao Gao,et al.  Community Detection in Degree-Corrected Block Models , 2016, The Annals of Statistics.

[22]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[23]  Jiashun Jin,et al.  Estimating network memberships by simplex vertex hunting , 2017 .

[24]  Alexander A. Razborov,et al.  On the Minimal Density of Triangles in Graphs , 2008, Combinatorics, Probability and Computing.

[25]  P. Hall,et al.  Martingale Limit Theory and its Application. , 1984 .

[26]  Laurent Massoulié,et al.  An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model , 2015, ArXiv.

[27]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[28]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[29]  Sébastien Bubeck,et al.  Testing for high‐dimensional geometry in random graphs , 2014, Random Struct. Algorithms.

[30]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[31]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[32]  Jon M. Kleinberg,et al.  Block models and personalized PageRank , 2016, Proceedings of the National Academy of Sciences.

[33]  L. Isserlis ON A FORMULA FOR THE PRODUCT-MOMENT COEFFICIENT OF ANY ORDER OF A NORMAL FREQUENCY DISTRIBUTION IN ANY NUMBER OF VARIABLES , 1918 .

[34]  C. Matias,et al.  New consistent and asymptotically normal parameter estimates for random‐graph mixture models , 2012 .

[35]  C. Matias,et al.  Parameter identifiability in a class of random graph mixture models , 2010, 1006.0826.

[36]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[37]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[39]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[40]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[41]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[42]  Chao Gao,et al.  Testing Network Structure Using Relations Between Small Subgraph Probabilities , 2017, ArXiv.

[43]  Zongming Ma,et al.  Optimal hypothesis testing for stochastic block models with growing degrees , 2017, ArXiv.