Deep Nonparametric Regression on Approximately Low-dimensional Manifolds

In this paper, we study the properties of nonparametric least squares regression using deep neural networks. We derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer for feedforward deep neural regression. Our error bounds achieve the minimax optimal rate and significantly improve over the existing ones in the sense that they depend linearly or quadratically on the dimension d of the predictor, instead of exponentially on d. We show that the neural regression estimator can circumvent the curse of dimensionality under the assumption that the predictor is supported on an approximate low-dimensional manifold. This assumption differs from the structural condition imposed on the target regression function and is weaker and more realistic than the exact low-dimensional manifold support assumption in the existing literature. We investigate how the prediction error of the neural regression estimator depends on the structure of neural networks and propose a notion of network relative efficiency between two types of neural networks, which provides a quantitative measure for evaluating the relative merits of different network structures. Our results are derived under weaker assumptions on the data distribution, the target regression function and the neural network structure than those in the existing literature.

[1]  M. Kohler,et al.  On deep learning as a remedy for the curse of dimensionality in nonparametric regression , 2019, The Annals of Statistics.

[2]  Hau-Tieng Wu,et al.  Local Linear Regression on Manifolds and Its Geometric Interpretation , 2012, 1201.0327.

[3]  Tuo Zhao,et al.  Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.

[4]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[5]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[6]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[7]  Adam Krzyzak,et al.  Estimation of a Function of Low Local Dimensionality by Deep Neural Networks , 2019, IEEE Transactions on Information Theory.

[8]  P. Bickel,et al.  Local polynomial regression on unknown manifolds , 2007, 0708.0983.

[9]  Masaaki Imaizumi,et al.  Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.

[10]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[11]  Yun Yang,et al.  Minimax-optimal nonparametric regression in high dimensions , 2014, 1401.7278.

[12]  P. Massart,et al.  Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .

[13]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[14]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[15]  S. Mitter,et al.  Testing the Manifold Hypothesis , 2013, 1310.0425.

[16]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[17]  S. Geer Estimating a Regression Function , 1990 .

[18]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19]  Phan-Minh Nguyen,et al.  A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks , 2020, ArXiv.

[20]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[21]  Xin Li,et al.  Limitations of the approximation capabilities of neural networks with one hidden layer , 1996, Adv. Comput. Math..

[22]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[23]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[24]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[25]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[26]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[27]  D. Dunson,et al.  Bayesian Manifold Regression , 2013, 1305.0617.

[28]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Thomas M. Stoker,et al.  Investigating Smooth Multiple Regression by the Method of Average Derivatives , 2015 .

[30]  S. Geer Applications of empirical process theory , 2000 .

[31]  Charles Fefferman,et al.  Whitney’s extension problem for $C^m$ , 2006 .

[32]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[33]  S. Geman,et al.  Nonparametric Maximum Likelihood Estimation by the Method of Sieves , 1982 .

[34]  Shijun Zhang,et al.  Nonlinear Approximation via Compositions , 2019, Neural Networks.

[35]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[36]  Tuo Zhao,et al.  Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks , 2019 .

[37]  Amos Ron,et al.  Nonlinear approximation using Gaussian kernels , 2009, 0911.2803.

[38]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[39]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[40]  S. Mendelson,et al.  Empirical processes and random projections , 2005 .

[41]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[42]  Johannes Schmidt-Hieber Nonparametric regression using deep neural networks with ReLU activation function , 2020 .

[43]  Xia Liu,et al.  Almost optimal estimates for approximation and learning by radial basis function networks , 2013, Machine Learning.

[44]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[45]  Richard G. Baraniuk,et al.  Random Projections of Smooth Manifolds , 2009, Found. Comput. Math..

[46]  Stefan Schaal,et al.  Local Dimensionality Reduction for Non-Parametric Regression , 2009, Neural Processing Letters.

[47]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[48]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[49]  D. Cox Approximation of Least Squares Regression on Nested Subspaces , 1988 .

[50]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[51]  Zuowei Shen,et al.  Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[52]  J. Schmidt-Hieber REJOINDER: “NONPARAMETRIC REGRESSION USING DEEP NEURAL NETWORKS WITH RELU ACTIVATION FUNCTION” BY JOHANNES SCHMIDT-HIEBER , 2020 .

[53]  Zuowei Shen,et al.  Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[54]  Haizhao Yang,et al.  Optimal Approximation Rate of ReLU Networks in terms of Width and Depth , 2021, Journal de Mathématiques Pures et Appliquées.

[55]  Joel L. Horowitz,et al.  Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions , 2007, 0803.2999.

[56]  Yingcun Xia,et al.  Variable selection for the single‐index model , 2007 .

[57]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[58]  C. J. Stone,et al.  The Dimensionality Reduction Principle for Generalized Additive Models , 1986 .

[59]  Bruno Pelletier Kernel density estimation on Riemannian manifolds , 2005 .

[60]  Johannes Schmidt-Hieber,et al.  Deep ReLU network approximation of functions on a manifold , 2019, ArXiv.

[61]  P. Bickel,et al.  Regression on manifolds: Estimation of the exterior derivative , 2011, 1103.1457.

[62]  P. Massart,et al.  Rates of convergence for minimum contrast estimators , 1993 .

[63]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[64]  John M. Lee Riemannian Manifolds: An Introduction to Curvature , 1997 .

[65]  S. Geer,et al.  Consistency for the least squares estimator in nonparametric regression , 1996 .

[66]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[67]  J. Cima,et al.  On weak* convergence in ¹ , 1996 .

[68]  W. Härdle,et al.  Direct Semiparametric Estimation of Single-Index Models with Discrete Covariates dpsfb950075.ps.tar = Enno MAMMEN J.S. MARRON: Mass Recentered Kernel Smoothers , 1996 .

[69]  Amos Ron,et al.  Approximation using scattered shifts of a multivariate function , 2008, 0802.2517.

[70]  Ewaryst Rafaj⌈owicz Nonparametric orthogonal series estimators of regression: A class attaining the optimal convergence rate in L2☆ , 1987 .

[71]  W. Härdle,et al.  Optimal Smoothing in Single-index Models , 1993 .

[72]  Taiji Suzuki,et al.  Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[73]  A. Montanari,et al.  Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function” , 2020, The Annals of Statistics.

[74]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[75]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[76]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[77]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[78]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[79]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[80]  S. Geer A New Approach to Least-Squares Estimation, with Applications , 1986 .

[81]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[82]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[83]  Harrie Hendriks,et al.  Nonparametric Estimation of a Probability Density on a Riemannian Manifold Using Fourier Expansions , 1990 .

[84]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[85]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.