Consistency of spectral clustering

Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

[1]  Tosio Kato Perturbation theory for linear operators , 1966 .

[2]  K. Atkinson THE NUMERICAL SOLUTION OF THE EIGENVALUE PROBLEM FOR COMPACT INTEGRAL OPERATORS , 2008 .

[3]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .

[4]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[5]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[6]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[7]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[8]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[9]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[10]  D. Pollard Convergence of stochastic processes , 1984 .

[11]  On the spectral properties of the matrix-valued Friedrichs model , 1991 .

[12]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[13]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Dirk Roose,et al.  An Improved Spectral Bisection Algorithm and its Application to Dynamic Load Balancing , 1995, EUROSIM International Conference.

[15]  A spectral algorithm for envelope reduction of sparse matrices , 1995, Numer. Linear Algebra Appl..

[16]  Bruce Hendrickson,et al.  An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations , 1995, SIAM J. Sci. Comput..

[17]  Charles J. Alpert,et al.  Spectral Partitioning: The More Eigenvectors, The Better , 1995, 32nd Design Automation Conference.

[18]  I. Ikramov,et al.  On the discrete spectrum of non-analytic matrix-valued Friedrichs model , 1995 .

[19]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[20]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[21]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[22]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[23]  I. Ikromov,et al.  On the discrete spectrum of the nonanalytic matrix-valued Friedrichs model , 1998 .

[24]  Stephen Guattery,et al.  On the Quality of Spectral Separators , 1998, SIAM J. Matrix Anal. Appl..

[25]  V. Koltchinskii Asymptotics of Spectral Projections of Some Random Matrices Approximating Integral Operators , 1998 .

[26]  Tamás Linder,et al.  The minimax distortion redundancy in empirical quantizer design , 1997, Proceedings of IEEE International Symposium on Information Theory.

[27]  R. Dudley,et al.  Uniform Central Limit Theorems: Notation Index , 2014 .

[28]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[29]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Christopher K. I. Williams,et al.  The Effect of the Input Density Distribution on Kernel-based Classifiers , 2000, ICML.

[31]  F. Chung,et al.  Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs , 2000 .

[32]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[33]  Partha Niyogi,et al.  An Approach to Data Reduction and Clustering with Theoretical Guarantees , 2000, ICML.

[34]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[36]  E. Berger UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .

[37]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[38]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[39]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[40]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[41]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[42]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[43]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[44]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[45]  M. Anthony Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning , 2002 .

[46]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[47]  Nello Cristianini,et al.  On the Eigenspectrum of the Gram Matrix and Its Relationship to the Operator Eigenspectrum , 2002, ALT.

[48]  Mikhail Belkin,et al.  Problems of learning on manifolds , 2003 .

[49]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[50]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[51]  Yoshua Bengio,et al.  Spectral Clustering and Kernel PCA are Learning Eigenfunctions , 2003 .

[52]  Matthias Hein,et al.  Measure Based Regularization , 2003, NIPS.

[53]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[54]  Ulrike von Luxburg,et al.  Statistical learning with similarity and dissimilarity functions , 2004 .

[55]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[56]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[57]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[58]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..