Efficient discovery of overlapping communities in massive networks

Detecting overlapping communities is essential to analyzing and exploring natural networks such as social networks, biological networks, and citation networks. However, most existing approaches do not scale to the size of networks that we regularly observe in the real world. In this paper, we develop a scalable approach to community detection that discovers overlapping communities in massive real-world networks. Our approach is based on a Bayesian model of networks that allows nodes to participate in multiple communities, and a corresponding algorithm that naturally interleaves subsampling from the network and updating an estimate of its communities. We demonstrate how we can discover the hidden community structure of several real-world networks, including 3.7 million US patents, 575,000 physics articles from the arXiv preprint server, and 875,000 connected Web pages from the Internet. Furthermore, we demonstrate on large simulated networks that our algorithm accurately discovers the true community structure. This paper opens the door to using sophisticated statistical models to analyze massive networks.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  R. Feynman Statistical Mechanics, A Set of Lectures , 1972 .

[3]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[4]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[5]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[6]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[7]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[8]  J. Maldacena The Large-N Limit of Superconformal Field Theories and Supergravity , 1997, hep-th/9711200.

[9]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[10]  D. Schlegel,et al.  Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds , 1998 .

[11]  L. Randall,et al.  An Alternative to compactification , 1999, hep-th/9906064.

[12]  L. Randall,et al.  A Large mass hierarchy from a small extra dimension , 1999, hep-ph/9905221.

[13]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[14]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[15]  P. Steinhardt,et al.  The Ekpyrotic universe: Colliding branes and the origin of the hot big bang , 2001, hep-th/0103239.

[16]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[17]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[18]  Edward J. Wollack,et al.  First year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Determination of cosmological parameters , 2003, astro-ph/0302209.

[19]  T. Padmanabhan Cosmological constant—the weight of the vacuum , 2002, hep-th/0212290.

[20]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[21]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  C. Will The confrontation between general relativity and experiment , 2004 .

[24]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[25]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[26]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[27]  Edward J. Wollack,et al.  Wilkinson Microwave Anisotropy Probe (WMAP) Three Year Results: Implications for Cosmology , 2006, astro-ph/0603449.

[28]  B. Fields,et al.  Big bang nucleosynthesis , 2006 .

[29]  H. Robbins A Stochastic Approximation Method , 1951 .

[30]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[31]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[33]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[34]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[35]  Edward J. Wollack,et al.  FIVE-YEAR WILKINSON MICROWAVE ANISOTROPY PROBE * OBSERVATIONS: COSMOLOGICAL INTERPRETATION , 2008, 0803.0547.

[36]  Christophe Ambroise,et al.  Variational Bayesian inference and complexity control for stochastic block models , 2009, 0912.2873.

[37]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[38]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[39]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[40]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[42]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[43]  Alan D. Martin,et al.  Review of Particle Physics , 2010 .

[44]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[45]  Sidney Redner,et al.  Community structure of the physical review citation network , 2009, J. Informetrics.

[46]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[47]  P. Ginsparg ArXiv at 20 , 2011, Nature.

[48]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[51]  Martin Rosvall,et al.  Compression of flow can reveal overlapping modular organization in networks , 2011, ArXiv.

[52]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Peter D. Hoff,et al.  Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[54]  Sinan Aral,et al.  Identifying Influential and Susceptible Members of Social Networks , 2012, Science.

[55]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[56]  Kevin E. Bassler,et al.  Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data , 2012, PLoS Comput. Biol..

[57]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[58]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[59]  C. Will The Confrontation between General Relativity and Experiment , 1980, Living reviews in relativity.