Diffusion Maps: Analysis and Applications

A lot of the data faced in science and engineering is not as complicated as it seems. There is the possibility of ¯nding low dimensional descriptions of this usually high dimensional data. One of the ways of achieving this is with the use of diffusion maps. Diffusion maps represent the dataset by a weighted graph in which points correspond to vertices and edges are weighted. The spectral properties of the graph Laplacian are then used to map the high dimensional data into a lower dimensional representation. The algorithm is introduced on simple test examples for which the low dimensional description is known. Justification of the algorithm is given by showing its equivalence to a suitable minimisation problem and to random walks on graphs. The description of random walks in terms of partial di®erential equations is discussed. The heat equation for a probability density function is derived and used to further analyse the algorithm. Applications of diffusion maps are presented at the end of this dissertation. The first application is clustering of data (i.e. partitioning of a data set into subsets so that the data points in each subset have similar characteristics). An approach based on di®usion maps (spectral clustering) is compared to the K-means clustering algorithm. We then discuss techniques for colour image quantization (reduction of distinct colours in an image). Finally, the diffusion maps are used to discover low dimensional description of high dimensional sets of images.

[1]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[3]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ioannis G Kevrekidis,et al.  Variable-free exploration of stochastic models: a gene regulatory network example. , 2006, The Journal of chemical physics.

[5]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[6]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[7]  D. Sorensen IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS , 1996 .

[8]  G. Stewart Introduction to matrix computations , 1973 .

[9]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[10]  Steven W. Zucker,et al.  Diffusion Maps and Geometric Harmonics for Automatic Target Recognition (ATR). Volume 2. Appendices , 2007 .

[11]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Lars Eldén,et al.  Numerical linear algebra in data mining , 2006, Acta Numerica.

[14]  M.A.L. Thathachar,et al.  Vector quantization using genetic K-means algorithm for image compression , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[15]  B. Mohar,et al.  Eigenvalues in Combinatorial Optimization , 1993 .

[16]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[17]  William Equitz,et al.  A new vector quantization clustering algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[19]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[20]  Chin-Chen Chang,et al.  A fast LBG codebook training algorithm for vector quantization , 1998 .

[21]  R. Radke A Matlab implementation of the Implicitly Restarted Arnoldi Method for solving large-scale eigenvalue problems , 1996 .

[22]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[24]  I. Herstein,et al.  Topics in algebra , 1964 .

[25]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[26]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[27]  R. Coifman,et al.  Detecting the slow manifold by anisotropic diffusion maps , 2007 .

[28]  Danny C. Sorensen,et al.  Deflation Techniques for an Implicitly Restarted Arnoldi Iteration , 1996, SIAM J. Matrix Anal. Appl..