Adaptive Geometric Multiscale Approximations for Intrinsically Low-dimensional Data

We consider the problem of efficiently approximating and encoding high-dimensional data sampled from a probability distribution $\rho$ in $\mathbb{R}^D$, that is nearly supported on a $d$-dimensional set $\mathcal{M}$ - for example supported on a $d$-dimensional Riemannian manifold. Geometric Multi-Resolution Analysis (GMRA) provides a robust and computationally efficient procedure to construct low-dimensional geometric approximations of $\mathcal{M}$ at varying resolutions. We introduce a thresholding algorithm on the geometric wavelet coefficients, leading to what we call adaptive GMRA approximations. We show that these data-driven, empirical approximations perform well, when the threshold is chosen as a suitable universal function of the number of samples $n$, on a wide variety of measures $\rho$, that are allowed to exhibit different regularity at different scales and locations, thereby efficiently encoding data from more complex measures than those supported on manifolds. These approximations yield a data-driven dictionary, together with a fast transform mapping data to coefficients, and an inverse of such a map. The algorithms for both the dictionary construction and the transforms have complexity $C n \log n$ with the constant linear in $D$ and exponential in $d$. Our work therefore establishes adaptive GMRA as a fast dictionary learning algorithm with approximation guarantees. We include several numerical experiments on both synthetic and real data, confirming our theoretical results and demonstrating the effectiveness of adaptive GMRA.

[1]  L. Rosasco,et al.  Multiscale geometric methods for data sets I: Multiscale SVD, noise and curvature , 2017 .

[2]  Stanislav Minsker,et al.  Multiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness , 2014, J. Mach. Learn. Res..

[3]  Stanislav Minsker,et al.  Dictionary Learning and Non‐Asymptotic Bounds for Geometric Multi‐Resolution Analysis , 2014, 1401.5833.

[4]  Rémi Gribonval,et al.  Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[5]  Mauro Maggioni,et al.  Multiscale dictionaries, transforms, and learning in high-dimensions , 2013, Optics & Photonics - Optical Engineering + Applications.

[6]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[7]  Guangliang Chen,et al.  A fast multiscale framework for data in high-dimensions: Measure estimation, anomaly detection, and compressive measurements , 2012, 2012 Visual Communications and Image Processing.

[8]  Lorenzo Rosasco,et al.  Learning Manifolds with K-Means and K-Flats , 2012, NIPS.

[9]  Mauro Maggioni,et al.  Approximation of Points on Low-Dimensional Manifolds Via Random Linear Projections , 2012, ArXiv.

[10]  Guangliang Chen,et al.  Multiscale geometric and spectral analysis of plane arrangements , 2011, CVPR 2011.

[11]  Cecilia Clementi,et al.  Polymer reversal rate calculated via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[12]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[13]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[14]  Gilad Lerman,et al.  Randomized hybrid linear modeling by local best-fit flats , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Guangliang Chen,et al.  Multiscale geometric wavelets for the analysis of point clouds , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[16]  Massimiliano Pontil,et al.  $K$ -Dimensional Coding Schemes in Hilbert Spaces , 2010, IEEE Transactions on Information Theory.

[17]  Arthur Szlam,et al.  Asymptotic regularity of subdivisions of Euclidean domains by iterated PCA and iterated 2-means , 2009 .

[18]  Mauro Maggioni,et al.  Multiscale Estimation of Intrinsic Dimensionality of Data Sets , 2009, AAAI Fall Symposium: Manifold Learning and Its Applications.

[19]  M. Maggioni,et al.  Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale SVD , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[20]  J. Tropp,et al.  Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions , 2009, 0909.4061.

[21]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[22]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Gabriel Peyré,et al.  Sparse Modeling of Textures , 2009, Journal of Mathematical Imaging and Vision.

[24]  Gabriel Peyré,et al.  Manifold models for signals and images , 2009, Comput. Vis. Image Underst..

[25]  Y. Meyer,et al.  Harmonic Analysis on Spaces of Homogeneous Type , 2008 .

[26]  Guangliang Chen,et al.  Foundations of a Multi-way Spectral Clustering Framework for Hybrid Linear Modeling , 2008, Found. Comput. Math..

[27]  Allen Y. Yang,et al.  Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data , 2008, SIAM Rev..

[28]  Michael Elad,et al.  Sparse and redundant representations and motion-estimation-free algorithm for video denoising , 2007, SPIE Optical Engineering + Applications.

[29]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  R. DeVore,et al.  Universal Algorithms for Learning Theory. Part II: Piecewise Polynomial Functions , 2007 .

[31]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[32]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[33]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  G. Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[35]  Wolfgang Dahmen,et al.  Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions , 2005, J. Mach. Learn. Res..

[36]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[38]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[41]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[42]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[43]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[45]  Michael T. Orchard,et al.  On the importance of combining wavelet-based nonlinear approximation with coding strategies , 2002, IEEE Trans. Inf. Theory.

[46]  L. Saul,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[47]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[48]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[49]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[50]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[51]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[52]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[53]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[54]  Peter W. Jones Rectifiable sets and the Traveling Salesman Problem , 1990 .

[55]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[56]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[57]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[58]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[59]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[60]  M. Kleinsteuber R. Gribonval R. Jenatton M. Seibert,et al.  On the Sample Complexity of Dictionary Learning , 2014 .

[61]  Gitta Kutyniok Compressed Sensing , 2012 .

[62]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[63]  Michael Elad,et al.  K-SVD : DESIGN OF DICTIONARIES FOR SPARSE REPRESENTATION , 2005 .

[64]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[65]  S. Mallat A wavelet tour of signal processing , 1998 .

[66]  S. Semmes,et al.  Analysis of and on uniformly rectifiable sets , 1993 .

[67]  Michael Christ,et al.  A T(b) theorem with remarks on analytic capacity and the Cauchy integral , 1990 .

[68]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .