Community Recovery in Hypergraphs

Data clustering is a core problem in many fields of science and engineering. Community recovery in graphs is one popular approach to data clustering, and it has received significant attention due to its wide applicability to social network applications, protein complex detection, shape matching, image segmentation, etc. While the community recovery in graphs has been extensively studied in the literature, the problem of community recovery in hypergraphs has not been studied much. In this paper, we study the generalized Censored Block Model (CBM), where observations consist of randomly chosen hyperedges of size d, each of which is associated with the modulo-2 sum of the values of the nodes in the hyperedge, corrupted by Bernoulli noise. We characterize the information-theoretic limit of the community recovery in hypergraphs. Our results are for the general cases of arbitrarily scaling d.

[1]  Amit Singer,et al.  Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery , 2014, 2014 IEEE International Symposium on Information Theory.

[2]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[3]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Andrea Montanari,et al.  Conditional Random Fields, Planted Constraint Satisfaction and Entropy Concentration , 2013, APPROX-RANDOM.

[5]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[6]  Kwang-Cheng Chen,et al.  Data extraction via histogram and arithmetic mean queries: Fundamental limits and algorithms , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[7]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Bruce Hajek,et al.  Information limits for recovering a hidden community , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[9]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[12]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Amit Singer,et al.  Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery , 2014, IEEE Transactions on Network Science and Engineering.

[15]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[16]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[17]  René Vidal,et al.  Multiframe Motion Segmentation with Missing Data Using PowerFactorization and GPCA , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[19]  Venu Madhav Govindu,et al.  A tensor decomposition for geometric grouping and segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Guido Caldarelli,et al.  Random hypergraphs and their applications , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  I-Hsiang Wang,et al.  On the fundamental statistical limit of community detection in random hypergraphs , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[22]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[23]  Nilesh N. Dalvi,et al.  Crowdsourcing Algorithms for Entity Resolution , 2014, Proc. VLDB Endow..

[24]  Tom Michoel,et al.  Alignment and integration of complex networks by hypergraph-based spectral clustering , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[26]  Guangliang Chen,et al.  Spectral Curvature Clustering (SCC) , 2009, International Journal of Computer Vision.

[27]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[28]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[29]  Kangwook Lee,et al.  Hypergraph Spectral Clustering in the Weighted Stochastic Block Model , 2018, IEEE Journal of Selected Topics in Signal Processing.

[30]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[31]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[32]  Andrea Montanari,et al.  Finding One Community in a Sparse Graph , 2015, Journal of Statistical Physics.

[33]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[34]  Kangwook Lee,et al.  Community Recovery in Hypergraphs , 2019, IEEE Transactions on Information Theory.

[35]  Praneeth Netrapalli,et al.  Non-Reconstructability in the Stochastic Block Model , 2014, ArXiv.

[36]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[37]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[38]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[39]  Yuxin Chen,et al.  Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[40]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[41]  Laurent Massoulié,et al.  Community Detection in the Labelled Stochastic Block Model , 2012, ArXiv.

[42]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[43]  R. Durrett Random Graph Dynamics: References , 2006 .

[44]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[45]  Shao-Lun Huang,et al.  Extracting sparse data via histogram queries , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[46]  Kangwook Lee,et al.  Information-theoretic limits of subspace clustering , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[47]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[48]  Bruce E. Hajek,et al.  Exact recovery threshold in the binary censored block model , 2015, 2015 IEEE Information Theory Workshop - Fall (ITW).

[49]  Osamu Watanabe Message Passing Algorithms for MLS-3LIN Problem , 2013, Algorithmica.

[50]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[51]  R. Dorfman The Detection of Defective Members of Large Populations , 1943 .

[52]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[53]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[54]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[55]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[56]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[57]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[58]  René Vidal,et al.  A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Ambedkar Dukkipati,et al.  Consistency of spectral hypergraph partitioning under planted partition model , 2015, 1505.01582.

[60]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[61]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[62]  Yuxin Chen,et al.  Community Recovery in Graphs with Locality , 2016, ICML.

[63]  Elchanan Mossel,et al.  Consistency thresholds for the planted bisection model , 2016 .

[64]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[65]  Will Perkins,et al.  Spectral thresholds in the bipartite stochastic block model , 2015, COLT.

[66]  Helmut Bölcskei,et al.  Robust Subspace Clustering via Thresholding , 2013, IEEE Transactions on Information Theory.

[67]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[68]  Ambedkar Dukkipati,et al.  Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques , 2016, J. Mach. Learn. Res..

[69]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[70]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[71]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[72]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[73]  Elchanan Mossel,et al.  Density Evolution in the Degree-correlated Stochastic Block Model , 2015, COLT.

[74]  Shai Ben-David,et al.  Clustering with Same-Cluster Queries , 2016, NIPS.

[75]  Emmanuel Abbe,et al.  Proof of the Achievability Conjectures for the General Stochastic Block Model , 2018 .

[76]  Florent Krzakala,et al.  Spectral detection on sparse hypergraphs , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[77]  René Vidal,et al.  Subspace Clustering , 2011, IEEE Signal Processing Magazine.

[78]  Osamu Watanabe,et al.  Average-Case Analysis for the MAX-2SAT Problem , 2006, SAT.