Statistical inference of assortative community structures

We develop a principled methodology to infer assortative communities in networks based on a nonparametric Bayesian formulation of the planted partition model. We show that this approach succeeds in finding statistically significant assortative modules in networks, unlike alternatives such as modularity maximization, which systematically overfits both in artificial as well as in empirical examples. In addition, we show that our method is not subject to a resolution limit, and can uncover an arbitrarily large number of communities, as long as there is statistical evidence for them. Our formulation is amenable to model selection procedures, which allow us to compare it to more general approaches based on the stochastic block model, and in this way reveal whether assortativity is in fact the dominating large-scale mixing pattern. We perform this comparison with several empirical networks, and identify numerous cases where the network's assortativity is exaggerated by traditional community detection methods, and we show how a more faithful degree of assortativity can be identified.

[1]  Tiago P. Peixoto Bayesian Stochastic Blockmodeling , 2017, Advances in Network Clustering and Blockmodeling.

[2]  Mark E. J. Newman,et al.  Generalized communities in networks , 2015, Physical review letters.

[3]  Tiago P. Peixoto Revealing consensus and dissensus between network partitions , 2020, Physical Review X.

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  Zohar Nussinov,et al.  Local multiresolution order in community detection , 2012, ArXiv.

[6]  Douglas R. White,et al.  Role models for complex networks , 2007, 0708.0958.

[7]  Daniel B. Larremore,et al.  Community Detection in Bipartite Networks with Stochastic Blockmodels , 2020, Physical review. E.

[8]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[9]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[10]  Aaron Clauset,et al.  Evaluating Overfit and Underfit in Models of Network Community Structure , 2018, IEEE Transactions on Knowledge and Data Engineering.

[11]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Xiao Zhang,et al.  Identification of core-periphery structure in networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[14]  Marián Boguñá,et al.  Sustaining the Internet with Hyperbolic Mapping , 2010, Nature communications.

[15]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[17]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[18]  Cristopher Moore,et al.  Scalable detection of statistically significant communities and hierarchies, using message passing for modularity , 2014, Proceedings of the National Academy of Sciences.

[19]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Tiago P. Peixoto Nonparametric Bayesian inference of the microcanonical stochastic block model. , 2016, Physical review. E.

[21]  P. Ronhovde,et al.  Multiresolution community detection for megascale networks by information-based replica correlations. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Frank Thomson Leighton,et al.  Graph Bisection Algorithms with Good Average Case Behavior , 1984, FOCS.

[23]  Alex Arenas,et al.  Analysis of the structure of complex networks at different resolution levels , 2007, physics/0703218.

[24]  Colin McDiarmid,et al.  Modularity in random regular graphs and lattices , 2013, Electron. Notes Discret. Math..

[25]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[27]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[28]  Colin McDiarmid,et al.  Modularity of regular and treelike graphs , 2018, J. Complex Networks.

[29]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[30]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[31]  Valdis E. Krebs,et al.  Uncloaking Terrorist Networks , 2002, First Monday.

[32]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[33]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Tiago P. Peixoto Hierarchical block structures and high-resolution model selection in large networks , 2013, ArXiv.

[35]  S. Wasserman,et al.  Stochastic a posteriori blockmodels: Construction and assessment , 1987 .

[36]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[37]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[38]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[39]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[40]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[41]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Colin McDiarmid,et al.  Modularity of tree-like and random regular graphs , 2016, ArXiv.

[43]  Petter Holme,et al.  Subnetwork hierarchies of biochemical pathways , 2002, Bioinform..

[44]  M. Newman Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, Physical review. E.

[45]  Jean-Gabriel Young,et al.  A Clarified Typology of Core-Periphery Structure in Networks , 2020, ArXiv.

[46]  James P. Bagrow,et al.  Communities and bottlenecks: trees and treelike networks have high modularity. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[48]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[51]  Tiago P. Peixoto Merge-split Markov chain Monte Carlo for community detection , 2020, Physical review. E.

[52]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[53]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[54]  Tiago P. Peixoto,et al.  The graph-tool python library , 2014 .