Parsimonious module inference in large networks.

We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and edges in the sampled network. We also obtain that the maximum number of detectable blocks scales as sqrt[N], where N is the number of nodes in the network, for a fixed average degree ⟨k⟩. We also show that the simplicity of the minimum description length approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of O(τNlogN), if the number of blocks is unknown, and O(τN) if it is known, where τ is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over 10(6) edges, and a dissortative, bipartite block structure.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Edward B. Baskerville,et al.  Spatial Guilds in the Serengeti Food Web Revealed by a Bayesian Group Model , 2010, PLoS Comput. Biol..

[3]  Alex Arenas,et al.  Analysis of the structure of complex networks at different resolution levels , 2007, physics/0703218.

[4]  R. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001 .

[5]  A. Châtelain,et al.  The European Physical Journal D , 1999 .

[6]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Toral,et al.  Fluctuation and Noise Letters , 2007 .

[8]  G. Bianconi Entropy of network ensembles. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[10]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[11]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[12]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[13]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[14]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[15]  T. S. Evans,et al.  Clique graphs and overlapping communities , 2010, ArXiv.

[16]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[19]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[20]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[21]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[22]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[23]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[24]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[26]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[27]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[28]  S. Wasserman,et al.  Blockmodels: Interpretation and evaluation , 1992 .

[29]  Jorma Rissanen,et al.  Information and Complexity in Statistical Modeling , 2006, ITW.

[30]  David Saad,et al.  The Interplay between Microscopic and Mesoscopic Structures in Complex Networks , 2010, PloS one.

[31]  Chid Apte,et al.  Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011 , 2011, KDD.

[32]  Douglas R. White,et al.  Role models for complex networks , 2007, 0708.0958.

[33]  Tiago P. Peixoto The entropy of stochastic blockmodel ensembles , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[35]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  October I Physical Review Letters , 2022 .

[37]  D. Garlaschelli,et al.  Maximum likelihood: extracting unbiased information from complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Florent Krzakala,et al.  Comparative study for inference of hidden classes in stochastic block models , 2012, ArXiv.

[39]  K. Pearson Biometrika , 1902, The American Naturalist.

[40]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  S. Wasserman,et al.  Building stochastic blockmodels , 1992 .

[42]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[43]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[44]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[45]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[46]  Journal of Chemical Physics , 1932, Nature.

[47]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.