Optimal Nonparametric Inference via Deep Neural Network

Deep neural network is a state-of-art method in modern science and technology. Much statistical literature have been devoted to understanding its performance in nonparametric estimation, whereas the results are suboptimal due to a redundant logarithmic sacrifice. In this paper, we show that such log-factors are not necessary. We derive upper bounds for the $L^2$ minimax risk in nonparametric estimation. Sufficient conditions on network architectures are provided such that the upper bounds become optimal (without log-sacrifice). Our proof relies on an explicitly constructed network estimator based on tensor product B-splines. We also derive asymptotic distributions for the constructed network and a relating hypothesis testing procedure. The testing procedure is further proven as minimax optimal under suitable network architectures.

[1]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[2]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[3]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[4]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[5]  Jianhua Z. Huang Local asymptotics for polynomial spline regression , 2003 .

[6]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[7]  M. Kohler,et al.  Nonasymptotic Bounds on the L2 Error of Neural Network Regression Estimates , 2006 .

[8]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[9]  A. Krzyżak,et al.  Adaptive regression estimation with multilayer feedforward neural networks , 2005 .

[10]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[11]  Taiji Suzuki,et al.  Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[12]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[14]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[15]  Adam Krzyżak,et al.  Nonparametric Regression Based on Hierarchical Interaction Models , 2017, IEEE Transactions on Information Theory.

[16]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[17]  Michael Kohler,et al.  Analysis of the rate of convergence of least squares neural network regression estimates in case of measurement errors , 2011, Neural Networks.

[18]  Jianhua Z. Huang Projection estimation in multiple regression with application to functional ANOVA models , 1998 .

[19]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[20]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[21]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[22]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[23]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[24]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[25]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.