The geometry of kernelized spectral clustering

Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.

[1]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[2]  Michael I. Jordan,et al.  Cluster Forests , 2011, Comput. Stat. Data Anal..

[3]  Dirong Chen,et al.  Consistency of regularized spectral clustering , 2011 .

[4]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mikhail Belkin,et al.  On Learning with Integral Operators , 2010, J. Mach. Learn. Res..

[6]  Mikhail Belkin,et al.  DATA SPECTROSCOPY: EIGENSPACES OF CONVOLUTION OPERATORS AND CLUSTERING , 2008, 0807.3719.

[7]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[8]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[9]  V. Koltchinskii,et al.  Empirical graph Laplacian approximation of Laplace–Beltrami operators: Large sample results , 2006, math/0612777.

[10]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[11]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[12]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[13]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[16]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  D. Whittaker,et al.  A Course in Functional Analysis , 1991, The Mathematical Gazette.

[19]  G. Wahba Spline Models for Observational Data , 1990 .

[20]  A. Sokal,et al.  Bounds on the ² spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality , 1988 .

[21]  齋藤 三郎,et al.  Theory of reproducing kernels and its applications , 1988 .

[22]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[23]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[24]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[25]  C J Isham,et al.  Methods of Modern Mathematical Physics, Vol 1: Functional Analysis , 1972 .

[26]  M. Reed Methods of Modern Mathematical Physics. I: Functional Analysis , 1972 .

[27]  Mark S. C. Reed,et al.  Method of Modern Mathematical Physics , 1972 .

[28]  G. Stewart Error Bounds for Approximate Invariant Subspaces of Closed Linear Operators , 1971 .

[29]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .