Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms

In this paper, community detection in hypergraphs is explored. Under a generative hypergraph model called “d-wise hypergraph stochastic block model” (d-hSBM) which naturally extends the Stochastic Block Model (SBM) from graphs to d-uniform hypergraphs, the fundamental limit on the asymptotic minimax misclassified ratio is characterized. For proving the achievability, we propose a two-step polynomial time algorithm that provably achieves the fundamental limit in the sparse hypergraph regime. For proving the optimality, the lower bound of the minimax risk is set by finding a smaller parameter space which contains the most dominant error events, inspired by the analysis in the achievability part. It turns out that the minimax risk decays exponentially fast to zero as the number of nodes tends to infinity, and the rate function is a weighted combination of several divergence terms, each of which is the Rényi divergence of order 1/2 between two Bernoulli distributions. The Bernoulli distributions involved in the characterization of the rate function are those governing the random instantiation of hyperedges in d-hSBM. Experimental results on both synthetic and real-world data validate our theoretical finding.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[3]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[4]  I-Hsiang Wang,et al.  On the Minimax Misclassification Ratio of Hypergraph Community Detection , 2018, IEEE Transactions on Information Theory.

[5]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[6]  Renée J. Miller,et al.  LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.

[7]  Yuxin Chen,et al.  Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[8]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[9]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[10]  Alexandre Proutière,et al.  Optimal Cluster Recovery in the Labeled Stochastic Block Model , 2015, NIPS.

[11]  Jean Ponce,et al.  A tensor-based algorithm for high-order graph matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[13]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[14]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[15]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[16]  Ambedkar Dukkipati,et al.  Consistency of spectral hypergraph partitioning under planted partition model , 2015, 1505.01582.

[17]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[18]  L. V. Rozovsky,et al.  A Lower Bound of Large-Deviation Probabilities for the Sample Mean under the Cramer Condition , 2003 .

[19]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[21]  Hoang Dau,et al.  Motif clustering and overlapping clustering for social network analysis , 2016, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[22]  Anderson Y. Zhang,et al.  Achieving Optimal Misclassication Proportion in Stochastic Block Model , 2015 .

[23]  Kangwook Lee,et al.  Community Recovery in Hypergraphs , 2017, IEEE Transactions on Information Theory.

[24]  Florent Krzakala,et al.  Spectral detection on sparse hypergraphs , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..