Community Recovery in Graphs with Locality

Motivated by applications in domains such as social networks and computational biology, we study the problem of community recovery in graphs with locality. In this problem, pairwise noisy measurements of whether two nodes are in the same community or different communities come mainly or exclusively from nearby nodes rather than uniformly sampled between all nodes pairs, as in most existing models. We present an algorithm that runs nearly linearly in the number of measurements and which achieves the information theoretic limit for exact recovery.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[3]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[4]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[5]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[6]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[7]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[8]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[9]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[10]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[11]  Leonidas J. Guibas,et al.  Near-Optimal Joint Object Matching via Convex Relaxation , 2014, ICML.

[12]  Nilgun Donmez,et al.  Hapsembler: An Assembler for Highly Polymorphic Genomes , 2011, RECOMB.

[13]  David Tse,et al.  Optimal haplotype assembly from high-throughput mate-pair reads , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[14]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[15]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[16]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[17]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[18]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[19]  Martin J. Wainwright,et al.  ISIT 2015 Tutorial: Information Theory and Machine Learning , 2015 .

[20]  Sriram Vishwanath,et al.  Haplotype assembly: An information theoretic view , 2014, 2014 IEEE Information Theory Workshop (ITW 2014).

[21]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[22]  YuanBo,et al.  Detecting functional modules in the yeast protein--protein interaction network , 2006 .

[23]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[24]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[25]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[26]  Yair Weiss,et al.  Belief Propagation , 2012, Encyclopedia of Social Network Analysis and Mining.

[27]  Sujay Sanghavi,et al.  Structured Low-Rank Matrix Factorization for Haplotype Assembly , 2016, IEEE Journal of Selected Topics in Signal Processing.

[28]  Elchanan Mossel,et al.  Density Evolution in the Degree-correlated Stochastic Block Model , 2015, COLT.

[29]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[30]  Ali Jalali,et al.  Low-Rank Matrix Recovery From Errors and Erasures , 2013, IEEE Transactions on Information Theory.

[31]  Amit Singer,et al.  Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery , 2014, IEEE Transactions on Network Science and Engineering.

[32]  Adel Javanmard,et al.  Phase transitions in semidefinite relaxations , 2015, Proceedings of the National Academy of Sciences.

[33]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[36]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[37]  Amit Singer,et al.  Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery , 2014, 2014 IEEE International Symposium on Information Theory.

[38]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[40]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[41]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[42]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Tim Roughgarden,et al.  How Hard is Inference for Structured Prediction? , 2015, ICML.

[44]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[45]  Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms , 2012 .

[46]  I. Csiszár Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .

[47]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[48]  Andrea J. Goldsmith,et al.  Information Recovery From Pairwise Measurements , 2015, IEEE Transactions on Information Theory.

[49]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[50]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[51]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[52]  Prateek Jain,et al.  Universal Matrix Completion , 2014, ICML.

[53]  Andrea J. Goldsmith,et al.  Information recovery from pairwise measurements: A shannon-theoretic approach , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[54]  H. Vikalo,et al.  SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming , 2015, BMC Genomics.

[55]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[56]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming , 2016, IEEE Trans. Inf. Theory.

[57]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[58]  David Tse,et al.  Community Recovery in Graphs with Locality — Supplemental Materials — , 2016 .

[59]  Bruce E. Hajek,et al.  Exact recovery threshold in the binary censored block model , 2015, 2015 IEEE Information Theory Workshop - Fall (ITW).

[60]  Yuxin Chen,et al.  Information recovery from pairwise measurements: A shannon-theoretic approach , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).

[61]  Chaitanya Swamy,et al.  Correlation Clustering: maximizing agreements via semidefinite programming , 2004, SODA '04.

[62]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[63]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[64]  Varun Jog,et al.  Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence , 2015, ArXiv.

[65]  Yudong Chen,et al.  Weighted Graph Clustering with Non-Uniform Uncertainties , 2014, ICML.

[66]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[67]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[68]  R. Quatrano Genomics , 1998, Plant Cell.

[69]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.