A survey on theoretical advances of community detection in networks

Real-world networks usually have community structure, that is, nodes are grouped into densely connected communities. Community detection is one of the most popular and best-studied research topics in network science and has attracted attention in many different fields, including computer science, statistics, social sciences, among others. Numerous approaches for community detection have been proposed in literature, from ad hoc algorithms to systematic model-based approaches. The large number of available methods leads to a fundamental question: whether a certain method can provide consistent estimates of community labels. The stochastic blockmodel (SBM) and its variants provide a convenient framework for the study of such problems. This article is a survey on the recent theoretical advances of community detection. The authors review a number of community detection methods and their theoretical properties, including graph cut methods, profile likelihoods, the pseudo-likelihood method, the variational method, belief propagation, spectral clustering, and semidefinite relaxations of the SBM. The authors also briefly discuss other research topics in community detection such as robust community detection, community detection with nodal covariates and model selection, as well as suggest a few possible directions for future research. WIREs Comput Stat 2017, 9:e1403. doi: 10.1002/wics.1403 For further resources related to this article, please visit the WIREs website.

[1]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[2]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[3]  D. F. Saldana,et al.  How Many Communities Are There? , 2014, 1412.1684.

[4]  Yunpeng Zhao,et al.  On consistency of model selection for stochastic block models , 2016, 1611.01238.

[5]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[6]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[7]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[8]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[9]  Elchanan Mossel,et al.  Local Algorithms for Block Models with Side Information , 2015, ITCS.

[10]  Can M. Le,et al.  Sparse random graphs: regularization and concentration of the Laplacian , 2015, ArXiv.

[11]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[12]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[13]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[14]  Can M. Le,et al.  Estimating the number of communities in networks by spectral methods , 2015, ArXiv.

[15]  Purnamrita Sarkar,et al.  Convex Relaxation for Community Detection with Covariates , 2016 .

[16]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[17]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[18]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[19]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[20]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[21]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[22]  D.G. Tzikas,et al.  The variational approximation for Bayesian inference , 2008, IEEE Signal Processing Magazine.

[23]  Chao Gao,et al.  Community Detection in Degree-Corrected Block Models , 2016, The Annals of Statistics.

[24]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[25]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[26]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[27]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[28]  Peter J. Bickel,et al.  Fitting community models to large sparse networks , 2012, ArXiv.

[29]  Chung-Kuan Cheng,et al.  Towards efficient hierarchical designs by ratio cut partitioning , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[30]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[31]  Hong Qin,et al.  Corrected Bayesian Information Criterion for Stochastic Block Models , 2016, Journal of the American Statistical Association.

[32]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[33]  Jiashun Jin,et al.  Fast network community detection by SCORE , 2012, ArXiv.

[34]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[35]  Xiao Zhang,et al.  Spectra of random graphs with community structure and arbitrary degrees , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Elchanan Mossel,et al.  Density Evolution in the Degree-correlated Stochastic Block Model , 2015, COLT.

[37]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[38]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[39]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[40]  P. Bickel,et al.  Role of normalization in spectral clustering for stochastic blockmodels , 2013, 1310.1495.

[41]  Jing Li,et al.  Robust Local Community Detection: On Free Rider Effect and Its Elimination , 2015, Proc. VLDB Endow..

[42]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[43]  Carey E. Priebe,et al.  Consistent Adjacency-Spectral Partitioning for the Stochastic Block Model When the Model Parameters Are Unknown , 2012, SIAM J. Matrix Anal. Appl..

[44]  E. Levina,et al.  Community extraction for social networks , 2010, Proceedings of the National Academy of Sciences.

[45]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[46]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[47]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[48]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[49]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[50]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[51]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[52]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[55]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[56]  P. Bickel,et al.  Likelihood-based model selection for stochastic block models , 2015, 1502.02069.

[57]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[59]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[60]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[61]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[62]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[63]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[64]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[65]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[66]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[67]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[68]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[69]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[70]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[71]  Purnamrita Sarkar,et al.  Hypothesis testing for automated community detection in networks , 2013, ArXiv.

[72]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[73]  Jing Lei A goodness-of-fit test for stochastic block models , 2014, 1412.4857.

[74]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: II. Multivariate relations. , 1999, The British journal of mathematical and statistical psychology.

[75]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[76]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  Elizaveta Levina,et al.  On semidefinite relaxations for the block model , 2014, ArXiv.

[78]  Bin Yu,et al.  Impact of regularization on spectral clustering , 2013, 2014 Information Theory and Applications Workshop (ITA).

[79]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[80]  Raj Rao Nadakuditi,et al.  Spectra of random graphs with arbitrary expected degrees , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[81]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[82]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[83]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[84]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[85]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[86]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[87]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[88]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[89]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[90]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[91]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[92]  Elena Marchiori,et al.  Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel K-Means , 2016, J. Mach. Learn. Res..

[93]  Cun-Quan Zhang,et al.  Optimal local community detection in social networks based on density drop of subgraphs , 2014, Pattern Recognit. Lett..

[94]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[95]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[96]  Can M. Le,et al.  Concentration and regularization of random graphs , 2015, Random Struct. Algorithms.

[97]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[98]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[99]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[100]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..