Estimating the number of communities in a network

Community detection, the division of a network into dense subnetworks with only sparse connections between them, has been a topic of vigorous study in recent years. However, while there exist a range of effective methods for dividing a network into a specified number of communities, it is an open question how to determine exactly how many communities one should use. Here we describe a mathematically principled approach for finding the number of communities in a network by maximizing the integrated likelihood of the observed network structure under an appropriate generative model. We demonstrate the approach on a range of benchmark networks, both real and computer generated.

[1]  Gerard T. Barkema,et al.  Monte Carlo Methods in Statistical Physics , 1999 .

[2]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[4]  Dino Pedreschi,et al.  A classification for community discovery methods in complex networks , 2011, Stat. Anal. Data Min..

[5]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[6]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[8]  Robert E Weiss,et al.  Bayesian methods for data analysis. , 2010, American journal of ophthalmology.

[9]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[10]  Christophe Ambroise,et al.  Variational Bayesian inference and complexity control for stochastic block models , 2009, 0912.2873.

[11]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Andreas Fink,et al.  Advances in Data Analysis, Data Handling and Business Intelligence: Proceedings of the 32nd Annual Conference of the Gesellschaft fr Klassifikation e.V., ... Data Analysis, and Knowledge Organization) , 2009 .

[13]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[14]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[15]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[16]  David Kempe,et al.  Modularity-maximizing graph communities via mathematical programming , 2007, 0710.2533.

[17]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[18]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[19]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[20]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[21]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[22]  Judith Rousseau,et al.  Bayes and empirical Bayes : Do they merge? , 2012, 1204.1470.

[23]  Christophe Ambroise,et al.  Bayesian Methods for Graph Clustering , 2008, GfKl.

[24]  Tiago P. Peixoto Model selection and hypothesis testing for large-scale network models with overlapping groups , 2014, ArXiv.

[25]  P. Latouche,et al.  Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood , 2015 .

[26]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Tiago P. Peixoto Hierarchical block structures and high-resolution model selection in large networks , 2013, ArXiv.

[28]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[29]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[30]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[31]  Christos H. Skiadas,et al.  Advances in Data Analysis , 2010 .

[32]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[33]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[34]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.