Descriptive vs. inferential community detection: pitfalls, myths and half-truths

Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is considered the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a “descriptive” or an “inferential” goal. While descriptive methods find patterns in networks based on intuitive notions of community structure, inferential methods articulate a precise generative model, and attempt to fit it to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.

[1]  Mark E. J. Newman,et al.  Structural inference for uncertain networks , 2015, Physical review. E.

[2]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  M. E. J. Newman,et al.  Network structure from rich but noisy data , 2017, Nature Physics.

[6]  Patrick J. Wolfe,et al.  Network histograms and universality of blockmodel approximation , 2013, Proceedings of the National Academy of Sciences.

[7]  Thomas C.M. Lee,et al.  Information and Complexity in Statistical Modeling , 2008 .

[8]  Tiago P. Peixoto Reconstructing networks with unknown and heterogeneous errors , 2018, Physical Review X.

[9]  Nick S. Jones,et al.  Community detection in networks with unobserved edges , 2018, ArXiv.

[10]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[11]  Jure Leskovec,et al.  Evolution of resilience in protein interactomes across the tree of life , 2018, Proceedings of the National Academy of Sciences.

[12]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[13]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[14]  Jean-Charles Delvenne,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2014, IEEE Transactions on Network Science and Engineering.

[15]  Tom Everitt,et al.  Universal Induction and Optimisation: No Free Lunch , 2013 .

[16]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[17]  Gesine Reinert,et al.  Estimating the number of communities in a network , 2016, Physical review letters.

[18]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[20]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[21]  Joel Nishimura,et al.  Configuring Random Graph Models with Fixed Degree Sequences , 2016, SIAM Rev..

[22]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[23]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[24]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[25]  V. Traag,et al.  Community detection in networks with positive and negative links. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Christian Tallberg A BAYESIAN APPROACH TO MODELING STOCHASTIC BLOCKSTRUCTURES WITH COVARIATES , 2004 .

[27]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[28]  Tiago P. Peixoto Network Reconstruction and Community Detection from Dynamics , 2019, Physical review letters.

[29]  Andreas Noack,et al.  Modularity clustering is force-directed layout , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Tiago P. Peixoto Bayesian Stochastic Blockmodeling , 2017, Advances in Network Clustering and Blockmodeling.

[31]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[32]  Peter Grassberger,et al.  Clustering Drives Assortativity and Community Structure in Ensembles of Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[34]  Toni Vallès-Català,et al.  Consistencies and inconsistencies between model selection and link prediction in networks. , 2017, Physical review. E.

[35]  Tiago P. Peixoto,et al.  The graph-tool python library , 2014 .

[36]  H. Akaike A new look at the statistical model identification , 1974 .

[37]  S. McGregor,et al.  No Free Lunch and Algorithmic Randomness , 2006 .

[38]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[40]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[41]  Tiago P. Peixoto Nonparametric Bayesian inference of the microcanonical stochastic block model. , 2016, Physical review. E.

[42]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Tiago P. Peixoto Revealing consensus and dissensus between network partitions , 2020, Physical Review X.

[45]  Edoardo M. Airoldi,et al.  Stacking models for nearly optimal link prediction in complex networks , 2019, Proceedings of the National Academy of Sciences.

[46]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Matthew J. Streeter,et al.  Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold , 2003, GECCO.

[48]  Cristopher Moore,et al.  Model selection for degree-corrected block models , 2012, Journal of statistical mechanics.

[49]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[50]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[51]  Renaud Lambiotte,et al.  Uncovering space-independent communities in spatial networks , 2010, Proceedings of the National Academy of Sciences.

[52]  R. Solé,et al.  Evolving protein interaction networks through gene duplication. , 2003, Journal of theoretical biology.

[53]  Marcus Hutter,et al.  Open Problems in Universal Induction & Intelligence , 2009, Algorithms.

[54]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[55]  Alex Arenas,et al.  Analysis of the structure of complex networks at different resolution levels , 2007, physics/0703218.

[56]  Florent Krzakala,et al.  Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.

[57]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.

[58]  Sergio Gómez,et al.  Hierarchical Multiresolution Method to Overcome the Resolution Limit in Complex Networks , 2012, Int. J. Bifurc. Chaos.

[59]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[60]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[61]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[62]  Morten Mørup,et al.  Learning latent structure in complex networks , 2009 .

[63]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[64]  P. Ronhovde,et al.  Phase transitions in random Potts systems and the community detection problem: spin-glass type and dynamic perspectives , 2010, 1008.2699.

[65]  Tomoji Shogenji,et al.  Hume’s Problem Solved: The Optimality of Meta-Induction , 2019, International Studies in the Philosophy of Science.

[66]  Danny C. Sorensen,et al.  Deflation Techniques for an Implicitly Restarted Arnoldi Iteration , 1996, SIAM J. Matrix Anal. Appl..

[67]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[68]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[69]  M. E. J. Newman,et al.  Representative community divisions of networks , 2021, Communications Physics.

[70]  D. Garlaschelli,et al.  Community detection for correlation matrices , 2013, 1311.1924.

[71]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[72]  M. E. J. Newman,et al.  Consistency of community structure in complex networks , 2019, Physical review. E.

[73]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[74]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  P. Latouche,et al.  Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood , 2015 .

[76]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[77]  Tiago P. Peixoto Hierarchical block structures and high-resolution model selection in large networks , 2013, ArXiv.

[78]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[79]  Tor Lattimore,et al.  No Free Lunch versus Occam's Razor in Supervised Learning , 2011, Algorithmic Probability and Friends.

[80]  George D. Montanez Why Machine Learning Works , 2017 .

[81]  Wiley India Cmos: Circuit Design, Layout, And Simulation , 2009 .

[82]  Aaron Clauset,et al.  Evaluating Overfit and Underfit in Models of Network Community Structure , 2018, IEEE Transactions on Knowledge and Data Engineering.

[83]  Cristopher Moore,et al.  Scalable detection of statistically significant communities and hierarchies, using message passing for modularity , 2014, Proceedings of the National Academy of Sciences.

[84]  Vincent A. Traag,et al.  From Louvain to Leiden: guaranteeing well-connected communities , 2018, Scientific Reports.

[85]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[86]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[87]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[88]  Luca Trevisan,et al.  Theory and Applications of Models of Computation , 2013, Lecture Notes in Computer Science.

[89]  Tatsuro Kawamoto,et al.  Algorithmic detectability threshold of the stochastic blockmodel , 2017, Physical review. E.

[90]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[91]  M. Newman Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, Physical review. E.

[92]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[93]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[94]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[95]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[96]  Rami Puzis,et al.  Link Prediction in Highly Fractional Data Sets , 2013 .

[97]  S. Bornholdt,et al.  When are networks truly modular , 2006, cond-mat/0606220.

[98]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[99]  J. Doye,et al.  Thermodynamics of Community Structure , 2006, cond-mat/0610077.

[100]  Cristopher Moore,et al.  The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness , 2017, Bull. EATCS.

[101]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[102]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[103]  Santo Fortunato,et al.  Network structure, metadata and the prediction of missing nodes , 2016, ArXiv.

[104]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[105]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[106]  Tiago P. Peixoto Disentangling homophily, community structure and triadic closure in networks , 2021, Physical Review X.

[107]  Xiao Zhang,et al.  Identification of core-periphery structure in networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[108]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[109]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[110]  Béla Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007, Random Struct. Algorithms.

[111]  Tiago P. Peixoto,et al.  Statistical inference of assortative community structures , 2020, ArXiv.

[112]  Marcus Hutter,et al.  On Universal Prediction and Bayesian Confirmation , 2007, Theor. Comput. Sci..

[113]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[114]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[115]  Jean-Charles Delvenne,et al.  The many facets of community detection in complex networks , 2016, Applied Network Science.

[116]  R. Pastor-Satorras,et al.  Class of correlated random networks with hidden variables. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[117]  Martin Rosvall,et al.  Estimating the resolution limit of the map equation in community detection. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[118]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[119]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.