Testing Changes in Communities for the Stochastic Block Model

We propose and analyze the problems of \textit{community goodness-of-fit and two-sample testing} for stochastic block models (SBM), where changes arise due to modification in community memberships of nodes. Motivated by practical applications, we consider the challenging sparse regime, where expected node degrees are constant, and the inter-community mean degree ($b$) scales proportionally to intra-community mean degree ($a$). Prior work has sharply characterized partial or full community recovery in terms of a "signal-to-noise ratio" ($\mathrm{SNR}$) based on $a$ and $b$. For both problems, we propose computationally-efficient tests that can succeed far beyond the regime where recovery of community membership is even possible. Overall, for large changes, $s \gg \sqrt{n}$, we need only $\mathrm{SNR}= O(1)$ whereas a na\"ive test based on community recovery with $O(s)$ errors requires $\mathrm{SNR}= \Theta(\log n)$. Conversely, in the small change regime, $s \ll \sqrt{n}$, via an information-theoretic lower bound, we show that, surprisingly, no algorithm can do better than the na\"ive algorithm that first estimates the community up to $O(s)$ errors and then detects changes. We validate these phenomena numerically on SBMs and on real-world datasets as well as Markov Random Fields where we only observe node data rather than the existence of links.

[1]  Sonia Kéfi,et al.  How Structured Is the Entangled Bank? The Surprisingly Simple Organization of Multiplex Ecological Networks Leads to Increased Persistence and Resilience , 2016, PLoS biology.

[2]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[3]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[4]  E. Arias-Castro,et al.  Community detection in dense random networks , 2014 .

[5]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[6]  Piyush Srivastava,et al.  Exact recovery in the Ising blockmodel , 2016, The Annals of Statistics.

[7]  Can M. Le,et al.  Concentration and regularization of random graphs , 2015, Random Struct. Algorithms.

[8]  Jiaming Xu,et al.  Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits , 2018, Information-Theoretic Methods in Data Science.

[9]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[10]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[11]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[12]  Danielle S. Bassett,et al.  Multi-scale brain networks , 2016, NeuroImage.

[13]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Constantinos Daskalakis,et al.  Testing Ising Models , 2016, IEEE Transactions on Information Theory.

[16]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[17]  Purnamrita Sarkar,et al.  Hypothesis testing for automated community detection in networks , 2013, ArXiv.

[18]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[19]  Jing Lei A goodness-of-fit test for stochastic block models , 2014, 1412.4857.

[20]  E. Bullmore,et al.  Hierarchical Organization of Human Cortical Networks in Health and Schizophrenia , 2008, The Journal of Neuroscience.

[21]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[22]  Robert Clarke,et al.  Differential dependency network analysis to identify condition-specific topological changes in biological networks , 2009, Bioinform..

[23]  Fan Chung,et al.  Graph Theory in the Information Age , 2010 .

[24]  A. Carpentier,et al.  Two-sample hypothesis testing for inhomogeneous random graphs , 2017, The Annals of Statistics.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  David Pfau,et al.  Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data , 2016, Neuron.

[27]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[28]  M. Araújo,et al.  Multiple interactions networks: towards more realistic descriptions of the web of life , 2018 .

[29]  Vincent Y. F. Tan,et al.  High-dimensional Gaussian graphical model selection: walk summability and local separation criterion , 2011, J. Mach. Learn. Res..

[30]  T. Cai,et al.  Direct estimation of differential networks. , 2014, Biometrika.

[31]  Hongzhe Li,et al.  Two-sample Test of Community Memberships of Weighted Stochastic Block Models. , 2018, 1811.12593.

[32]  Chao Gao,et al.  Testing Network Structure Using Relations Between Small Subgraph Probabilities , 2017, ArXiv.

[33]  Linyuan Lu,et al.  Complex Graphs and Networks (CBMS Regional Conference Series in Mathematics) , 2006 .

[34]  Leto Peel,et al.  Detecting Change Points in the Large-Scale Structure of Evolving Networks , 2014, AAAI.

[35]  Zongming Ma,et al.  Optimal hypothesis testing for stochastic block models with growing degrees , 2017, ArXiv.

[36]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[37]  A. Montanari,et al.  Asymptotic mutual information for the balanced binary stochastic block model , 2016 .

[38]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[39]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[40]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[41]  Debdeep Pati,et al.  Exact tests for stochastic block models , 2016 .

[42]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Ari S. Morcos,et al.  History-dependent variability in population dynamics during evidence accumulation in cortex , 2016, Nature Neuroscience.

[44]  Martin J. Wainwright,et al.  Information-theoretic bounds on model selection for Gaussian Markov random fields , 2010, 2010 IEEE International Symposium on Information Theory.

[45]  Bin Yu,et al.  Impact of regularization on spectral clustering , 2016 .

[46]  Peter J. Carrington,et al.  A goodness-of-fit index for blockmodels , 1979 .

[47]  Ulrike von Luxburg,et al.  Two-Sample Tests for Large Random Graphs Using Network Statistics , 2017, COLT.

[48]  Ulrike von Luxburg,et al.  Practical methods for graph two-sample testing , 2018, NeurIPS.

[49]  Daniel M. Kane,et al.  Testing Bayesian Networks , 2016, IEEE Transactions on Information Theory.

[50]  George Michailidis,et al.  Change point estimation in high dimensional Markov random‐field models , 2014, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[51]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[52]  A. Dembo,et al.  Gibbs Measures and Phase Transitions on Sparse Random Graphs , 2009, 0910.5460.

[53]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[54]  E. Arias-Castro,et al.  Community Detection in Sparse Random Networks , 2013, 1308.2955.

[55]  C. Priebe,et al.  A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs , 2017 .