Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms

[1]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[2]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[3]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[4]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[5]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[6]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[14]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[15]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Arthur D. Szlam,et al.  Diffusion wavelet packets , 2006 .

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[22]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[23]  François Fouss,et al.  A novel way of computing similarities between nodes of a graph, with application to collaborative recommendation , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[24]  Kenneth Ward Church,et al.  Iterative Denoising for Cross-Corpus Discovery , 2004 .