Robust Hypergraph Clustering via Convex Relaxation of Truncated MLE

We study hypergraph clustering in the weighted <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula>-uniform hypergraph stochastic block model (<inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula><italic>-WHSBM</italic>), where each edge consisting of <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> nodes from the same community has higher expected weight than the edges consisting of nodes from different communities. We propose a new hypergraph clustering algorithm, called <italic>CRTMLE</italic>, and provide its performance guarantee under the <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula><italic>-WHSBM</italic> for general parameter regimes. We show that the proposed method achieves the order-wise optimal or the best existing results for approximately balanced community sizes. Moreover, our results settle the first recovery guarantees for growing number of clusters of unbalanced sizes. Involving theoretical analysis and empirical results, we demonstrate the robustness of our algorithm against the unbalancedness of community sizes or the presence of outlier nodes.

[1]  Yizhe Zhu,et al.  Exact Recovery in the Hypergraph Stochastic Block Model: a Spectral Algorithm , 2018, ArXiv.

[2]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[3]  Michel X. Goemans,et al.  Stochastic Block Model for Hypergraphs: Statistical limits and a semidefinite programming approach , 2018, ArXiv.

[4]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[5]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[6]  Amin Coja-Oghlan Coloring Semirandom Graphs Optimally , 2004, ICALP.

[7]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[8]  Xiaodong Li,et al.  Convex Relaxation Methods for Community Detection , 2018, Statistical Science.

[9]  Yudong Chen,et al.  Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting , 2014, ICML.

[10]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[11]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[12]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[13]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Uriel Feige,et al.  Heuristics for Semirandom Graph Problems , 2001, J. Comput. Syst. Sci..

[15]  Ambedkar Dukkipati,et al.  A Provable Generalized Tensor Spectral Method for Uniform Hypergraph Partitioning , 2015, ICML.

[16]  Alan J. Laub,et al.  Matrix analysis - for scientists and engineers , 2004 .

[17]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[18]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[19]  Tavor Z. Baharav,et al.  Ultra Fast Medoid Identification via Correlated Sequential Halving , 2019, NeurIPS.

[20]  Larry Goldstein,et al.  Size biased couplings and the spectral gap for random regular graphs , 2015, 1510.06013.

[21]  Venu Madhav Govindu,et al.  A tensor decomposition for geometric grouping and segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Ambedkar Dukkipati,et al.  Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques , 2016, J. Mach. Learn. Res..

[23]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[24]  Mihyun Kang,et al.  Evolution of high-order connected components in random hypergraphs , 2015, Electron. Notes Discret. Math..

[25]  Ambedkar Dukkipati,et al.  Consistency of spectral hypergraph partitioning under planted partition model , 2015, 1505.01582.

[26]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[27]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[28]  Varun Jog,et al.  Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence , 2015, ArXiv.

[29]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[30]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[31]  Brendan P. W. Ames Guaranteed clustering and biclustering via semidefinite programming , 2012, Mathematical Programming.

[32]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[33]  Babak Hassibi,et al.  Graph Clustering With Missing Data: Convex Algorithms and Analysis , 2014, NIPS.

[34]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[35]  Babak Hassibi,et al.  Finding Dense Clusters via "Low Rank + Sparse" Decomposition , 2011, ArXiv.

[36]  Farid Alizadeh,et al.  Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization , 1995, SIAM J. Optim..

[37]  I-Hsiang Wang,et al.  Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms , 2018, AISTATS.

[38]  Guido Caldarelli,et al.  Random hypergraphs and their applications , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Kangwook Lee,et al.  Hypergraph Spectral Clustering in the Weighted Stochastic Block Model , 2018, IEEE Journal of Selected Topics in Signal Processing.

[40]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[41]  Dong Xia,et al.  Community Detection for Hypergraph Networks via Regularized Tensor Power Iteration , 2019 .

[42]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[43]  Yizhe Zhu,et al.  Community Detection in the Sparse Hypergraph Stochastic Block Model , 2019, ArXiv.

[44]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[45]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[47]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[48]  Po-Ling Loh,et al.  Optimal rates for community estimation in the weighted stochastic block model , 2017, The Annals of Statistics.

[49]  Yudong Chen,et al.  Exponential Error Rates of SDP for Block Models: Beyond Grothendieck’s Inequality , 2017, IEEE Transactions on Information Theory.

[50]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[51]  Andrei Z. Broder,et al.  On the second eigenvalue of random regular graphs , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[52]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[53]  Michel X. Goemans,et al.  Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[54]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[55]  Ilan Shomorony,et al.  Bandit-PAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits , 2020, NeurIPS.

[56]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[57]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[58]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[59]  Michael Krivelevich,et al.  Semirandom Models as Benchmarks for Coloring Algorithms , 2006, ANALCO.

[60]  Benjamin Rossman,et al.  Average-case complexity of detecting cliques , 2010 .

[61]  R. Bhatia Perturbation Bounds for Matrix Eigenvalues , 2007 .

[62]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[63]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[64]  W. Li,et al.  Spectra of Hypergraphs and Applications , 1996 .

[65]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[66]  Ambedkar Dukkipati,et al.  Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model , 2014, NIPS.

[67]  Ambedkar Dukkipati,et al.  Spectral Clustering Using Multilinear SVD: Analysis, Approximations and Applications , 2015, AAAI.

[68]  I-Hsiang Wang,et al.  On the Minimax Misclassification Ratio of Hypergraph Community Detection , 2018, IEEE Transactions on Information Theory.