Multiscale regression on unknown manifolds

We consider the regression problem of estimating functions on $ \mathbb{R}^D $ but supported on a $ d $-dimensional manifold $ \mathcal{M} ~~\subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $ \mathcal{M} $ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $ \mathbb{R}^D $. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[4]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[5]  Michael Christ,et al.  A T(b) theorem with remarks on analytic capacity and the Cauchy integral , 1990 .

[6]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[7]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[8]  S. Mallat A wavelet tour of signal processing , 1998 .

[9]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[10]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing, 2nd Edition , 1999 .

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  I. Daubechies,et al.  Tree Approximation and Optimal Encoding , 2001 .

[14]  Michael T. Orchard,et al.  On the importance of combining wavelet-based nonlinear approximation with coding strategies , 2002, IEEE Trans. Inf. Theory.

[15]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[16]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[17]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Wolfgang Dahmen,et al.  Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions , 2005, J. Mach. Learn. Res..

[23]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[25]  R. DeVore,et al.  Universal Algorithms for Learning Theory. Part II: Piecewise Polynomial Functions , 2007 .

[26]  P. Bickel,et al.  Local polynomial regression on unknown manifolds , 2007, 0708.0983.

[27]  Guangliang Chen,et al.  Spectral Curvature Clustering (SCC) , 2009, International Journal of Computer Vision.

[28]  Y. Meyer,et al.  Harmonic Analysis on Spaces of Homogeneous Type , 2008 .

[29]  Ding-Xuan Zhou,et al.  Learning and approximation by Gaussians on Riemannian manifolds , 2009, Adv. Comput. Math..

[30]  Arthur Szlam,et al.  Asymptotic regularity of subdivisions of Euclidean domains by iterated PCA and iterated 2-means , 2009 .

[31]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[32]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[33]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[35]  V. V. Buldygin,et al.  Inequalities for the distributions of functionals of sub-Gaussian vectors , 2010 .

[36]  Vin de Silva,et al.  Reduction A Global Geometric Framework for Nonlinear Dimensionality , 2011 .

[37]  Guangliang Chen,et al.  Multiscale geometric and spectral analysis of plane arrangements , 2011, CVPR 2011.

[38]  Nathan Srebro,et al.  Error Analysis of Laplacian Eigenmaps for Semi-supervised Learning , 2011, AISTATS.

[39]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[40]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[41]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[42]  Sanjoy Dasgupta,et al.  A tree-based regressor that adapts to intrinsic dimension , 2012, J. Comput. Syst. Sci..

[43]  M. Maggioni,et al.  Multi-scale geometric methods for data sets II: Geometric Multi-Resolution Analysis , 2012 .

[44]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[45]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[46]  Stanislav Minsker,et al.  Multiscale Dictionary Learning: Non-Asymptotic Bounds and Robustness , 2014, J. Mach. Learn. Res..

[47]  Mauro Maggioni,et al.  Learning adaptive multiscale approximations to data and functions near low-dimensional sets , 2016, 2016 IEEE Information Theory Workshop (ITW).

[48]  Ann B. Lee,et al.  A Spectral Series Approach to High-Dimensional Nonparametric Regression , 2016, 1602.00355.

[49]  Mauro Maggioni,et al.  Adaptive Geometric Multiscale Approximations for Intrinsically Low-dimensional Data , 2016, Journal of machine learning research.

[50]  S. Vigogna,et al.  Estimating multi-index models with response-conditional least squares , 2020, 2003.04788.