Scalable Inference of Overlapping Communities

We develop a scalable algorithm for posterior inference of overlapping communities in large networks. Our algorithm is based on stochastic variational inference in the mixed-membership stochastic blockmodel (MMSB). It naturally interleaves subsampling the network with estimating its community structure. We apply our algorithm on ten large, real-world networks with up to 60,000 nodes. It converges several orders of magnitude faster than the state-of-the-art algorithm for MMSB, finds hundreds of communities in large real-world networks, and detects the true communities in 280 benchmark networks with equal or better accuracy compared to other scalable algorithms.

[1]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[6]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[7]  Lars Kai Hansen,et al.  Infinite multiple membership relational modeling for complex networks , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[8]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[11]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[12]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  H. Robbins A Stochastic Approximation Method , 1951 .

[14]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[15]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[16]  Le Song,et al.  Dynamic mixed membership blockmodel for evolving networks , 2009, ICML '09.

[17]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[18]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[20]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[21]  Le Song,et al.  A Multiscale Community Blockmodel for Network Exploration , 2011, AISTATS.

[22]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[23]  Jure Leskovec,et al.  Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model , 2011, UAI.

[24]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[25]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[26]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[27]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .