Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

Spectral clustering is a celebrated algorithm that partitions the objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there are many other applications in which only <italic>multi</italic>way similarity measures are available. This motivates us to explore the multiway measurement setting. In this paper, we develop two algorithms intended for such setting: hypergraph spectral clustering (HSC) and hypergraph spectral clustering with local refinement (HSCLR). Our main contribution lies in performance analysis of the polytime algorithms under a random hypergraph model, which we name the weighted stochastic block model, in which objects and multiway measures are modeled as nodes and weights of hyperedges, respectively. Denoting by <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> the number of nodes, our analysis reveals the following: 1) HSC outputs a partition which is better than a random guess if the sum of edge weights (to be explained later) is <inline-formula><tex-math notation="LaTeX">$\Omega (n)$</tex-math> </inline-formula>; 2) HSC outputs a partition which coincides with the hidden partition except for a vanishing fraction of nodes if the sum of edge weights is <inline-formula><tex-math notation="LaTeX">$\omega (n)$</tex-math> </inline-formula>; and 3) HSCLR exactly recovers the hidden partition if the sum of edge weights is on the order of <inline-formula><tex-math notation="LaTeX">$n \log n$</tex-math></inline-formula>. Our results improve upon the state of the arts recently established under the model and they first settle the orderwise optimal results for the binary edge weight case. Moreover, we show that our results lead to efficient sketching algorithms for subspace clustering, a computer vision application. Finally, we show that HSCLR achieves the information-theoretic limits for a special yet practically relevant model, thereby showing no computational barrier for the case.

[1]  Christos Boutsidis,et al.  Spectral Clustering via the Power Method - Provably , 2013, ICML.

[2]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[3]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[4]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[7]  I-Hsiang Wang,et al.  On the fundamental statistical limit of community detection in random hypergraphs , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[8]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[9]  T. Tao Topics in Random Matrix Theory , 2012 .

[10]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[11]  Michel X. Goemans,et al.  Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[12]  Venu Madhav Govindu,et al.  A tensor decomposition for geometric grouping and segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Andrea Montanari,et al.  Finding One Community in a Sparse Graph , 2015, Journal of Statistical Physics.

[14]  Andrei Z. Broder,et al.  On the second eigenvalue of random regular graphs , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[15]  Kangwook Lee,et al.  Community Recovery in Hypergraphs , 2017, IEEE Transactions on Information Theory.

[16]  Jean Ponce,et al.  A Tensor-Based Algorithm for High-Order Graph Matching , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Kangwook Lee,et al.  Information-theoretic limits of subspace clustering , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[18]  Guangliang Chen,et al.  Spectral Curvature Clustering (SCC) , 2009, International Journal of Computer Vision.

[19]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[20]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Ankur Moitra,et al.  Noisy tensor completion via the sum-of-squares hierarchy , 2015, Mathematical Programming.

[22]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[23]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[24]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[25]  Emmanuel Abbe,et al.  Proof of the Achievability Conjectures for the General Stochastic Block Model , 2018 .

[26]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[27]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[29]  Ambedkar Dukkipati,et al.  A Provable Generalized Tensor Spectral Method for Uniform Hypergraph Partitioning , 2015, ICML.

[30]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[31]  Po-Ling Loh,et al.  Optimal rates for community estimation in the weighted stochastic block model , 2017, The Annals of Statistics.

[32]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[33]  Yuxin Chen,et al.  Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[34]  Praneeth Netrapalli,et al.  Non-Reconstructability in the Stochastic Block Model , 2014, ArXiv.

[35]  Constantine Caramanis,et al.  Greedy Subspace Clustering , 2014, NIPS.

[36]  Elizaveta Levina,et al.  On semidefinite relaxations for the block model , 2014, ArXiv.

[37]  Ambedkar Dukkipati,et al.  Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model , 2014, NIPS.

[38]  J. Matou On Approximate Geometric K-clustering , 1999 .

[39]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[40]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[41]  Florent Krzakala,et al.  Spectral detection on sparse hypergraphs , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[42]  Varun Jog,et al.  Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence , 2015, ArXiv.

[43]  René Vidal,et al.  A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[45]  Alexandre Proutière,et al.  Community Detection via Random and Adaptive Sampling , 2014, COLT.

[46]  I-Hsiang Wang,et al.  On the Minimax Misclassification Ratio of Hypergraph Community Detection , 2018, IEEE Transactions on Information Theory.

[47]  Cristopher Moore,et al.  Counting connected graphs and hypergraphs via the probabilistic method , 2007, Random Struct. Algorithms.

[48]  Guido Caldarelli,et al.  Random hypergraphs and their applications , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[50]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[51]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[52]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[53]  Helmut Bölcskei,et al.  Dimensionality-reduced subspace clustering , 2015, ArXiv.

[54]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[55]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[56]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[57]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[58]  Venu Madhav Govindu,et al.  Efficient Higher-Order Clustering on the Grassmann Manifold , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[60]  Tom Michoel,et al.  Alignment and integration of complex networks by hypergraph-based spectral clustering , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[62]  van Vu,et al.  A Simple SVD Algorithm for Finding Hidden Partitions , 2014, Combinatorics, Probability and Computing.

[63]  Mihyun Kang,et al.  Evolution of high-order connected components in random hypergraphs , 2015, Electron. Notes Discret. Math..

[64]  Ambedkar Dukkipati,et al.  Consistency of spectral hypergraph partitioning under planted partition model , 2015, 1505.01582.

[65]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[66]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[67]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[68]  Ambedkar Dukkipati,et al.  Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques , 2016, J. Mach. Learn. Res..

[69]  Yuxin Chen,et al.  Community Recovery in Graphs with Locality , 2016, ICML.

[70]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[71]  Will Perkins,et al.  Spectral thresholds in the bipartite stochastic block model , 2015, COLT.

[72]  Helmut Bölcskei,et al.  Robust Subspace Clustering via Thresholding , 2013, IEEE Transactions on Information Theory.

[73]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[74]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[75]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[76]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.