A Generic Sample Splitting Approach for Refined Community Recovery in Stochastic Block Models

We propose and analyze a generic method for community recovery in stochastic block models and degree corrected block models. This approach can exactly recover the hidden communities with high probability when the expected node degrees are of order $\log n$ or higher. Starting from a roughly correct community partition given by some conventional community recovery algorithm, this method refines the partition in a cross clustering step. Our results simplify and extend some of the previous work on exact community recovery, discovering the key role played by sample splitting. The proposed method is simple and can be implemented with many practical community recovery algorithms.

[1]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[2]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[3]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[4]  Jing Lei,et al.  Generic Sample Splitting for Refined Community Recovery in Degree Corrected Stochastic Block Models , 2016 .

[5]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[6]  Can M. Le,et al.  Optimization via Low-rank Approximation, with Applications to Community Detection in Networks , 2014, ArXiv.

[7]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  M. Skala Hypergeometric tail inequalities: ending the insanity , 2013, 1311.5939.

[9]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[10]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[11]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[12]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[13]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[14]  Alessandro Rinaldo,et al.  Consistency of Spectral Clustering in Sparse Stochastic Block Models , 2013 .

[15]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[16]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Cristopher Moore,et al.  Model selection for degree-corrected block models , 2012, Journal of statistical mechanics.

[19]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[20]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[21]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[22]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[23]  Can M. Le,et al.  Optimization via Low-rank Approximation for Community Detection in Networks , 2014 .

[24]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[25]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[26]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[27]  Carey E. Priebe,et al.  Consistent Adjacency-Spectral Partitioning for the Stochastic Block Model When the Model Parameters Are Unknown , 2012, SIAM J. Matrix Anal. Appl..

[28]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[29]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[30]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[31]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[32]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[33]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[34]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[35]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[36]  S. Wasserman,et al.  Blockmodels: Interpretation and evaluation , 1992 .

[37]  Elchanan Mossel,et al.  Consistency Thresholds for Binary Symmetric Block Models , 2014, ArXiv.