Entropy and Sampling Numbers of Classes of Ridge Functions

We study the properties of ridge functions $$f(x)=g(a\cdot x)$$f(x)=g(a·x) in high dimensions $$d$$d from the viewpoint of approximation theory. The function classes considered consist of ridge functions such that the profile $$g$$g is a member of a univariate Lipschitz class with smoothness $$\alpha >0$$α>0 (including infinite smoothness) and the ridge direction $$a$$a has $$p$$p-norm $$\Vert a\Vert _p\le 1$$‖a‖p≤1. First, we investigate entropy numbers in order to quantify the compactness of these ridge function classes in $$L_{\infty }$$L∞. We show that they are essentially as compact as the class of univariate Lipschitz functions. Second, we examine sampling numbers and consider two extreme cases. In the case $$p=2$$p=2, sampling ridge functions on the Euclidean unit ball suffers from the curse of dimensionality. Moreover, it is as difficult as sampling general multivariate Lipschitz functions, which is in sharp contrast to the result on entropy numbers. When we additionally assume that all feasible profiles have a first derivative uniformly bounded away from zero at the origin, the complexity of sampling ridge functions reduces drastically to the complexity of sampling univariate Lipschitz functions. In between, the sampling problem’s degree of difficulty varies, depending on the values of $$\alpha $$α and $$p$$p. Surprisingly, we see almost the entire hierarchy of tractability levels as introduced in the recent monographs by Novak and Woźniakowski.

[1]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[2]  Henryk Wozniakowski,et al.  Approximation of infinitely differentiable multivariate functions is intractable , 2009, J. Complex..

[3]  Reinhold Schneider,et al.  Concepts of Data-Sparse Tensor-Product Approximation in Many-Particle Modelling , 2008 .

[4]  H. Triebel,et al.  Function Spaces, Entropy Numbers, Differential Operators: Function Spaces , 1996 .

[5]  G. Lorentz,et al.  Constructive approximation : advanced problems , 1996 .

[6]  Holger Rauhut,et al.  The Gelfand widths of lp-balls for 0p<=1 , 2010, J. Complex..

[7]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[8]  Allan Pinkus Approximating by Ridge Functions , 1997 .

[9]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[10]  Claude Jeffrey Gittelson,et al.  Sparse tensor discretizations of high-dimensional parametric and stochastic PDEs* , 2011, Acta Numerica.

[11]  Volkan Cevher,et al.  Active Learning of Multi-Index Function Models , 2012, NIPS.

[12]  N. J. A. Sloane,et al.  Lower bounds for constant weight codes , 1980, IEEE Trans. Inf. Theory.

[13]  Henryk Wozniakowski,et al.  The curse of dimensionality for numerical integration of smooth functions II , 2012, J. Complex..

[14]  H. Woxniakowski Information-Based Complexity , 1988 .

[15]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[16]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[17]  Steffen Dereich,et al.  Infinite-Dimensional Quadrature and Approximation of Distributions , 2009, Found. Comput. Math..

[18]  E. Candès Harmonic Analysis of Neural Networks , 1999 .

[19]  H. Triebel Fractals and spectra , 1997 .

[20]  E. Novak,et al.  Tractability of Multivariate Problems Volume II: Standard Information for Functionals , 2010 .

[21]  Jan Vyb ´ õral Weak and quasi-polynomial tractability of approximation of infinitely differentiable functions , 2013 .

[22]  C. Schütt Entropy numbers of diagonal operators between symmetric Banach spaces , 1984 .

[23]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[24]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[25]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[26]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[27]  B. Logan,et al.  Optimal reconstruction of a function from its projections , 1975 .

[28]  Emmanuel J. Cand Harmonic Analysis of Neural Networks , 1998 .

[29]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[30]  Thomas Kühn,et al.  A Lower Estimate for Entropy Numbers , 2001, J. Approx. Theory.

[31]  A. Pinkus,et al.  Identifying Linear Combinations of Ridge Functions , 1999 .

[32]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[33]  I. Daubechies,et al.  Capturing Ridge Functions in High Dimensions from Point Queries , 2012 .

[34]  Vitaly Maiorov,et al.  Geometric properties of the ridge function manifold , 2010, Adv. Comput. Math..

[35]  M. Fowler,et al.  Function Spaces , 2022 .

[36]  Joseph F. Traub,et al.  Faster Valuation of Financial Derivatives , 1995 .

[37]  E. Novak,et al.  Tractability of Multivariate Problems , 2008 .

[38]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[39]  Jan Vybíral,et al.  Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..

[40]  H. Triebel,et al.  Function Spaces in Lipschitz Domains and Optimal Rates of Convergence for Sampling , 2006 .

[41]  Henryk Wozniakowski,et al.  The curse of dimensionality for numerical integration of smooth functions , 2012, Math. Comput..

[42]  Jan Vybíral,et al.  Weak and quasi-polynomial tractability of approximation of infinitely differentiable functions , 2013, J. Complex..

[43]  E. Candès,et al.  Ridgelets: a key to higher-dimensional intermittency? , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[44]  Aicke Hinrichs,et al.  Entropy numbers of spheres in Banach and quasi-Banach spaces , 2015, J. Approx. Theory.