Community Detection through Likelihood Optimization: In Search of a Sound Model

Community detection is one of the most important problems in network analysis. Among many algorithms proposed for this task, methods based on statistical inference are of particular interest: they are mathematically sound and were shown to provide partitions of good quality. Statistical inference methods are based on fitting some random graph model (a.k.a. null model) to the observed network by maximizing the likelihood. The choice of this model is extremely important and is the main focus of the current study. We provide an extensive theoretical and empirical analysis to compare several models: the widely used planted partition model, recently proposed degree-corrected modification of this model, and a new null model having some desirable statistical properties. We also develop and compare two likelihood optimization algorithms suitable for the models under consideration. An extensive empirical analysis on a variety of datasets shows, in particular, that the new model is the best one for describing most of the considered real-world complex networks according to the likelihood of observed graph structures.

[1]  David Kempe,et al.  Modularity-maximizing graph communities via mathematical programming , 2007, 0710.2533.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Christophe Ambroise,et al.  Fast online graph clustering via Erdös-Rényi mixture , 2008, Pattern Recognit..

[4]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[5]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[6]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[7]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[8]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[9]  Mason A. Porter,et al.  Think Locally, Act Locally: The Detection of Small, Medium-Sized, and Large Communities in Large Networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[11]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[12]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[13]  Matthew O. Jackson,et al.  Identifying Community Structures from Network Data via Maximum Likelihood Methods , 2009 .

[14]  Stefan Boettcher,et al.  Extremal Optimization for Graph Partitioning , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[16]  Marko Bajec,et al.  Model of complex networks based on citation dynamics , 2013, WWW.

[17]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[18]  J. Ramasco,et al.  Inversion method for content-based networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[20]  T. Geisel,et al.  Forecast and control of epidemics in a globalized world. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[22]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[23]  V. Latora,et al.  Detecting complex network modularity by dynamical clustering. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[25]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Haluk Bingol,et al.  Community Detection in Complex Networks Using Genetic Algorithms , 2006, 0711.0491.

[27]  Mark E. J. Newman,et al.  Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, ArXiv.

[28]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[29]  Matthew C. Elder,et al.  On computer viral infection and the effect of immunization , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[30]  James P. Bagrow Evaluating local community methods in networks , 2007, 0706.3880.

[31]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[32]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[33]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[35]  Lars Kai Hansen,et al.  Deterministic modularity optimization , 2007 .

[36]  Buzhou Tang,et al.  Network structure exploration via Bayesian nonparametric models , 2014, 1403.0466.

[37]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Kevin E. Bassler,et al.  Improved community structure detection using a modified fine-tuning strategy , 2009, ArXiv.

[39]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[40]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[41]  V. Traag,et al.  Community detection in networks with positive and negative links. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  Niloy Ganguly,et al.  Metrics for Community Analysis , 2016, ACM Comput. Surv..

[46]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Amedeo Caflisch,et al.  Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[49]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[50]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[51]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[52]  Tiago P. Peixoto Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Marián Boguñá,et al.  Sustaining the Internet with Hyperbolic Mapping , 2010, Nature communications.

[54]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[55]  Myra Spiliopoulou,et al.  Evolution in Social Networks: A Survey , 2011, Social Network Data Analytics.

[56]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[57]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[58]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[59]  Dino Pedreschi,et al.  A classification for community discovery methods in complex networks , 2011, Stat. Anal. Data Min..

[60]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[61]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[62]  Wei Ren,et al.  Simple probabilistic algorithm for detecting community structure. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[63]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[64]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[65]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[66]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[67]  Santo Fortunato,et al.  Network structure, metadata and the prediction of missing nodes , 2016, ArXiv.

[68]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[69]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[70]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[71]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[73]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[74]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[75]  M. Newman Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, Physical review. E.

[76]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[77]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[78]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..