Graph Clustering With Missing Data: Convex Algorithms and Analysis

We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed. We analyze two programs, one which works for dense graphs and one which works for both sparse and dense graphs, but requires some a priori knowledge of the total cluster size, that are based on the convex optimization approach for low-rank matrix recovery using nuclear norm minimization. For the commonly used Stochastic Block Model, we obtain explicit bounds on the parameters of the problem (size and sparsity of clusters, the amount of observed data) and the regularization parameter characterize the success and failure of the programs. We corroborate our theoretical findings through extensive simulations. We also run our algorithm on a real data set obtained from crowdsourcing an image classification task on the Amazon Mechanical Turk, and observe significant performance improvement over traditional methods such as k-means.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[6]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[7]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[8]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[9]  Stefano Lonardi,et al.  A parallel algorithm for clustering protein-protein interaction networks , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[10]  Van H. Vu,et al.  Spectral norm of random matrices , 2005, STOC '05.

[11]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[12]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[13]  Robert E. Tarjan,et al.  Clustering Social Networks , 2007, WAW.

[14]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[15]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[16]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[17]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[18]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[19]  Stephen A. Vavasis,et al.  Nuclear norm minimization for the planted clique and biclique problems , 2009, Math. Program..

[20]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[21]  Babak Hassibi,et al.  Finding Dense Clusters via "Low Rank + Sparse" Decomposition , 2011, ArXiv.

[22]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[23]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[24]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[25]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[26]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[27]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[28]  Nir Ailon,et al.  Breaking the Small Cluster Barrier of Graph Clustering , 2013, ICML.

[29]  Brendan P. W. Ames Robust convex relaxation for the planted clique and densest k-subgraph problems , 2013, 1305.4891.

[30]  Ali Jalali,et al.  Low-Rank Matrix Recovery From Errors and Erasures , 2013, IEEE Transactions on Information Theory.

[31]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[32]  Stephen A. Vavasis,et al.  Convex optimization for the planted k-disjoint-clique problem , 2010, Math. Program..

[33]  Babak Hassibi,et al.  Sharp performance bounds for graph clustering via convex optimization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.