Hypergraph Clustering in the Weighted Stochastic Block Model via Convex Relaxation of Truncated MLE

We study hypergraph clustering under the weighted $d$-uniform hypergraph stochastic block model ($d$-WHSBM), where each edge consisting of $d$ nodes has higher expected weight if $d$ nodes are from the same community compared to edges consisting of nodes from different communities. We propose a new hypergraph clustering algorithm, which is a convex relaxation of truncated maximum likelihood estimator (CRTMLE), that can handle the relatively sparse, high-dimensional regime of the $d$-WHSBM with community sizes of different orders. We provide performance guarantees of this algorithm under a unified framework for different parameter regimes, and show that it achieves the order-wise optimal or the best existing results for approximately balanced community sizes. We also demonstrate the first recovery guarantees for the setting with growing number of communities of unbalanced sizes.

[1]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[2]  Amin Coja-Oghlan Coloring Semirandom Graphs Optimally , 2004, ICALP.

[3]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[4]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[5]  Mihyun Kang,et al.  Evolution of high-order connected components in random hypergraphs , 2015, Electron. Notes Discret. Math..

[6]  Ambedkar Dukkipati,et al.  Consistency of spectral hypergraph partitioning under planted partition model , 2015, 1505.01582.

[7]  Yudong Chen,et al.  Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting , 2014, ICML.

[8]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[9]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[10]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[11]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[12]  Andrei Z. Broder,et al.  On the second eigenvalue of random regular graphs , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13]  Michael Krivelevich,et al.  Semirandom Models as Benchmarks for Coloring Algorithms , 2006, ANALCO.

[14]  Ambedkar Dukkipati,et al.  A Provable Generalized Tensor Spectral Method for Uniform Hypergraph Partitioning , 2015, ICML.

[15]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[16]  Michel X. Goemans,et al.  Stochastic Block Model for Hypergraphs: Statistical limits and a semidefinite programming approach , 2018, ArXiv.

[17]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[18]  Alan J. Laub,et al.  Matrix analysis - for scientists and engineers , 2004 .

[19]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[20]  Babak Hassibi,et al.  Finding Dense Clusters via "Low Rank + Sparse" Decomposition , 2011, ArXiv.

[21]  Varun Jog,et al.  Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence , 2015, ArXiv.

[22]  Farid Alizadeh,et al.  Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization , 1995, SIAM J. Optim..

[23]  Brendan P. W. Ames Guaranteed clustering and biclustering via semidefinite programming , 2012, Mathematical Programming.

[24]  Yizhe Zhu,et al.  Exact Recovery in the Hypergraph Stochastic Block Model: a Spectral Algorithm , 2018, ArXiv.

[25]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[26]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Dong Xia,et al.  Community Detection for Hypergraph Networks via Regularized Tensor Power Iteration , 2019 .

[28]  W. Li,et al.  Spectra of Hypergraphs and Applications , 1996 .

[29]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[30]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[31]  Yudong Chen,et al.  Exponential Error Rates of SDP for Block Models: Beyond Grothendieck’s Inequality , 2017, IEEE Transactions on Information Theory.

[32]  Ambedkar Dukkipati,et al.  Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques , 2016, J. Mach. Learn. Res..

[33]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[34]  Michel X. Goemans,et al.  Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[35]  Ambedkar Dukkipati,et al.  Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model , 2014, NIPS.

[36]  Ambedkar Dukkipati,et al.  Spectral Clustering Using Multilinear SVD: Analysis, Approximations and Applications , 2015, AAAI.

[37]  I-Hsiang Wang,et al.  On the Minimax Misclassification Ratio of Hypergraph Community Detection , 2018, IEEE Transactions on Information Theory.

[38]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[40]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[41]  Po-Ling Loh,et al.  Optimal rates for community estimation in the weighted stochastic block model , 2017, The Annals of Statistics.

[42]  Uriel Feige,et al.  Heuristics for Semirandom Graph Problems , 2001, J. Comput. Syst. Sci..

[43]  I-Hsiang Wang,et al.  Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms , 2018, AISTATS.

[44]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[45]  Guido Caldarelli,et al.  Random hypergraphs and their applications , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Kangwook Lee,et al.  Hypergraph Spectral Clustering in the Weighted Stochastic Block Model , 2018, IEEE Journal of Selected Topics in Signal Processing.

[47]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[48]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[49]  Larry Goldstein,et al.  Size biased couplings and the spectral gap for random regular graphs , 2015, 1510.06013.

[50]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.