Learning adaptive multiscale approximations to data and functions near low-dimensional sets

In the setting where a data set in ℝD consists of samples from a probability measure ρ concentrated on or near an unknown d-dimensional set M, with D large but d ≪ D, we consider two sets of problems: geometric approximation of M and regression of a function f on M. In the first case we construct multiscale low-dimensional empirical approximations of M, which are adaptive when M has geometric regularity that may vary at different locations and scales, and give performance guarantees. In the second case we exploit these empirical geometric approximations to construct multiscale approximations to f on M, which adapt to the unknown regularity of f even when this varies at different scales and locations. We prove guarantees showing that we attain the same learning rates as if f was defined on a Euclidean domain of dimension d, instead of an unknown manifold M. All algorithms have complexity O(n log n), with constants scaling linearly in D and exponentially in d.