An efficient and principled method for detecting communities in networks

A fundamental problem in the analysis of network data is the detection of network communities, groups of densely interconnected nodes, which may be overlapping or disjoint. Here we describe a method for finding overlapping communities based on a principled statistical approach using generative network models. We show how the method can be implemented using a fast, closed-form expectation-maximization algorithm that allows us to analyze networks of millions of nodes in reasonable running times. We test the method both on real-world networks and on synthetic benchmarks and find that it gives results competitive with previous methods. We also show that the same approach can be used to extract nonoverlapping community divisions via a relaxation method, and demonstrate that the algorithm is competitively fast and accurate for the nonoverlapping problem.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  P. Arabie,et al.  An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling , 1975 .

[4]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[5]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[6]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[8]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[9]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[11]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[13]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[15]  K. A. Samani,et al.  Detecting overlapping community structure of networks based on vertex–vertex correlations , 2009 .

[16]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[17]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Michael T. Gastner,et al.  The spatial structure of networks , 2006 .

[19]  Marc Barthelemy,et al.  Spatial Networks , 2010, Encyclopedia of Social Network Analysis and Mining.

[20]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[21]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[22]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[23]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[24]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[29]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[30]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[31]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[33]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[35]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[36]  Xiang-Sun Zhang,et al.  Modularity optimization in community detection of complex networks , 2009 .

[37]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[38]  Tina Eliassi-Rad,et al.  Applying latent dirichlet allocation to group discovery in large graphs , 2009, SAC '09.

[39]  Samuel Kaski,et al.  A Block Model Suitable for Sparse Graphs , 2009 .

[40]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[42]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[43]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[44]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  H. Akaike A new look at the statistical model identification , 1974 .

[46]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[47]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[48]  James P. Bagrow Evaluating local community methods in networks , 2007, 0706.3880.

[49]  András A. Benczúr,et al.  An efficient block model for clustering sparse graphs , 2010, MLG '10.

[50]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[51]  Ioannis Psorakis,et al.  Soft Partitioning in Networks via Bayesian Non-negative Matrix Factorization , 2010 .