Learning Functions of Few Arbitrary Linear Parameters in High Dimensions

AbstractLet us assume that f is a continuous function defined on the unit ball of ℝd, of the form f(x)=g(Ax), where A is a k×d matrix and g is a function of k variables for k≪d. We are given a budget m∈ℕ of possible point evaluations f(xi), i=1,…,m, of f, which we are allowed to query in order to construct a uniform approximating function. Under certain smoothness and variation assumptions on the function g, and an arbitrary choice of the matrix A, we present in this paper 1.a sampling choice of the points {xi} drawn at random for each function approximation;2.algorithms (Algorithm 1 and Algorithm 2) for computing the approximating function, whose complexity is at most polynomial in the dimension d and in the number m of points. Due to the arbitrariness of A, the sampling points will be chosen according to suitable random distributions, and our results hold with overwhelming probability. Our approach uses tools taken from the compressed sensing framework, recent Chernoff bounds for sums of positive semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions.

[1]  F. R. Gantmakher The Theory of Matrices , 1984 .

[2]  W. Rudin Function Theory in the Unit Ball of Cn , 1980 .

[3]  R. DeVore,et al.  Approximation of Functions of Few Variables in High Dimensions , 2011 .

[4]  Aswin C. Sankaranarayanan,et al.  Compressive Sensing , 2008, Computer Vision, A Reference Guide.

[5]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[6]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[7]  J. Tropp User-Friendly Tail Bounds for Matrix Martingales , 2011 .

[8]  E. Candès,et al.  Ridgelets: a key to higher-dimensional intermittency? , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[9]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[10]  Yonina C. Eldar,et al.  Noise Folding in Compressed Sensing , 2011, IEEE Signal Processing Letters.

[11]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[12]  Rudolf Ahlswede,et al.  Strong converse for identification via quantum channels , 2000, IEEE Trans. Inf. Theory.

[13]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[14]  I. Daubechies,et al.  Capturing Ridge Functions in High Dimensions from Point Queries , 2012 .

[15]  P. Wojtaszczyk Complexity of approximation of functions of few variables in high dimensions , 2011, J. Complex..

[16]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[17]  F. John Plane Waves and Spherical Means: Applied To Partial Differential Equations , 1981 .

[18]  R. DeVore,et al.  Instance-optimality in probability with an ℓ1-minimization decoder , 2009 .

[19]  E. Novak,et al.  Tractability of Multivariate Problems , 2008 .

[20]  Babak Hassibi,et al.  A simplified approach to recovery conditions for low rank matrices , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[21]  Massimo Fornasier,et al.  Numerical Methods for Sparse Recovery , 2010 .

[22]  R. Oliveira Sums of random Hermitian matrices and an inequality by Rudelson , 2010, 1004.3821.

[23]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[24]  P. Wojtaszczyk ` 1 minimisation with noisy data , 2011 .

[25]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[26]  P. Wojtaszczyk 1 Minimization with Noisy Data , 2012, SIAM J. Numer. Anal..

[27]  S. Foucart A note on guaranteed sparse recovery via ℓ1-minimization , 2010 .

[28]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[29]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[30]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[31]  Jan Vybíral,et al.  Compressed learning of high-dimensional sparse functions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Massimo Fornasier,et al.  Theoretical Foundations and Numerical Methods for Sparse Recovery , 2010, Radon Series on Computational and Applied Mathematics.

[33]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[34]  E. Candès Harmonic Analysis of Neural Networks , 1999 .

[35]  Peter Lancaster,et al.  The theory of matrices , 1969 .

[36]  E. Candès Ridgelets: estimating with ridge functions , 2003 .

[37]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[38]  B. Logan,et al.  Optimal reconstruction of a function from its projections , 1975 .

[39]  M. Ledoux The concentration of measure phenomenon , 2001 .

[40]  Dr. M. G. Worster Methods of Mathematical Physics , 1947, Nature.

[41]  Henryk Wozniakowski,et al.  Approximation of infinitely differentiable multivariate functions is intractable , 2009, J. Complex..

[42]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[43]  D. Freedman,et al.  A dozen de Finetti-style results in search of a theory , 1987 .

[44]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..