Detecting Overlapping Communities in Networks Using Spectral Methods

Community detection is a fundamental problem in network analysis which is made more challenging by overlaps between communities which often occur in practice. Here we propose a general, flexible, and interpretable generative model for overlapping communities, which can be thought of as a generalization of the degree-corrected stochastic block model. We develop an efficient spectral algorithm for estimating the community memberships, which deals with the overlaps by employing the K-medians algorithm rather than the usual K-means for clustering in the spectral domain. We show that the algorithm is asymptotically consistent when networks are not too sparse and the overlaps between communities not too large. Numerical experiments on both simulated networks and many real social networks demonstrate that our method performs very well compared to a number of benchmark methods for overlapping community detection.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[3]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[4]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[5]  Eric H Davidson,et al.  The gene regulatory network basis of the "community effect," and analysis of a sea urchin embryo example. , 2010, Developmental biology.

[6]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[7]  Colin Cooper,et al.  Algorithms and Models for the Web-Graph , 2004, Lecture Notes in Computer Science.

[8]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[10]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[11]  Can M. Le,et al.  Optimization via Low-rank Approximation, with Applications to Community Detection in Networks , 2014, ArXiv.

[12]  Aidong Zhang,et al.  Protein Interaction Networks: Computational Analysis , 2009 .

[13]  Bradford L. Chamberlain,et al.  Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations , 2001 .

[14]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[15]  S. Vavasis,et al.  M L ] 7 O ct 2 01 3 Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization ∗ , 2013 .

[16]  Purnamrita Sarkar,et al.  Hypothesis testing for automated community detection in networks , 2013, ArXiv.

[17]  Trish,et al.  Protecting adolescents from harm. Findings from the National Longitudinal Study on Adolescent Health. , 1997, JAMA.

[18]  Ji Zhu,et al.  On Consistency of Community Detection in Networks , 2011, ArXiv.

[19]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[20]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  C. Nickel RANDOM DOT PRODUCT GRAPHS A MODEL FOR SOCIAL NETWORKS , 2008 .

[23]  Alessandro Rinaldo,et al.  Consistency of Spectral Clustering in Sparse Stochastic Block Models , 2013 .

[24]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[25]  YU BIN,et al.  IMPACT OF REGULARIZATION ON SPECTRAL CLUSTERING , 2016 .

[26]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[27]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[28]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[30]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[31]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[32]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[33]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[34]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Can M. Le,et al.  Optimization via Low-rank Approximation for Community Detection in Networks , 2014 .

[36]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[37]  P. Bickel,et al.  Role of normalization in spectral clustering for stochastic blockmodels , 2013, 1310.1495.

[38]  C. Priebe,et al.  Universally consistent vertex classification for latent positions graphs , 2012, 1212.1182.

[39]  Clara Pizzuti,et al.  Overlapped community detection in complex networks , 2009, GECCO.

[40]  E. Levina,et al.  Community extraction for social networks , 2010, Proceedings of the National Academy of Sciences.

[41]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[42]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[44]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[45]  Christophe Ambroise,et al.  Overlapping Stochastic Block Models , 2009 .

[46]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[47]  Edward R. Scheinerman,et al.  Random Dot Product Graph Models for Social Networks , 2007, WAW.

[48]  D. F. Saldana,et al.  How Many Communities Are There? , 2014, 1412.1684.

[49]  Tamara G. Kolda,et al.  Graph partitioning models for parallel computing , 2000, Parallel Comput..

[50]  Elchanan Mossel,et al.  Consistency thresholds for the planted bisection model , 2016 .

[51]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[52]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[53]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[54]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[55]  Mark E. J. Newman,et al.  Spectral methods for network community detection and graph partitioning , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.