Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model

In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degree-corrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing eect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degree-corrected random-walk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.

[1]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[3]  Satish Rao,et al.  Learning Mixtures of Product Distributions Using Correlations and Independence , 2008, COLT.

[4]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[7]  R. Oliveira Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges , 2009, 0911.0600.

[8]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Anirban Dasgupta,et al.  Spectral analysis of random graphs with skewed degree distributions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[12]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[13]  Sivaraman Balakrishnan,et al.  Noise Thresholds for Spectral Clustering , 2011, NIPS.

[14]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[15]  Mark Jerrum,et al.  Simulated annealing for graph bisection , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[16]  Philip M. Long,et al.  Finding Planted Partitions in Nearly Linear Time using Arrested Spectral Clustering , 2010, ICML.

[17]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[18]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[19]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[21]  Fan Chung Graham,et al.  On the Spectra of General Random Graphs , 2011, Electron. J. Comb..

[22]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[23]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[24]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[25]  Amin Coja-Oghlan,et al.  Finding Planted Partitions in Random Graphs with General Degree Distributions , 2009, SIAM J. Discret. Math..

[26]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[27]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[28]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..