论文信息 - Optimal Nonparametric Inference via Deep Neural Network

Optimal Nonparametric Inference via Deep Neural Network

Deep neural network is a state-of-art method in modern science and technology. Much statistical literature have been devoted to understanding its performance in nonparametric estimation, whereas the results are suboptimal due to a redundant logarithmic sacrifice. In this paper, we show that such log-factors are not necessary. We derive upper bounds for the $L^2$ minimax risk in nonparametric estimation. Sufficient conditions on network architectures are provided such that the upper bounds become optimal (without log-sacrifice). Our proof relies on an explicitly constructed network estimator based on tensor product B-splines. We also derive asymptotic distributions for the constructed network and a relating hypothesis testing procedure. The testing procedure is further proven as minimax optimal under suitable network architectures.

[1] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[2] Sanjog Misra,et al. Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[3] C. J. Stone,et al. Additive Regression and Other Nonparametric Models , 1985 .

[4] L. Schumaker. Spline Functions: Basic Theory , 1981 .

[5] Jianhua Z. Huang. Local asymptotics for polynomial spline regression , 2003 .

[6] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.

[7] M. Kohler,et al. Nonasymptotic Bounds on the L2 Error of Neural Network Regression Estimates , 2006 .

[8] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[9] A. Krzyżak,et al. Adaptive regression estimation with multilayer feedforward neural networks , 2005 .

[10] C. J. Stone,et al. The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[11] Taiji Suzuki,et al. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.

[12] Franco Scarselli,et al. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13] Ji Wan,et al. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[14] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[15] Adam Krzyżak,et al. Nonparametric Regression Based on Hierarchical Interaction Models , 2017, IEEE Transactions on Information Theory.

[16] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[17] Michael Kohler,et al. Analysis of the rate of convergence of least squares neural network regression estimates in case of measurement errors , 2011, Neural Networks.

[18] Jianhua Z. Huang. Projection estimation in multiple regression with application to functional ANOVA models , 1998 .

[19] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[20] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.

[21] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[22] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[23] Dmitry Yarotsky,et al. Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[24] Carl de Boor,et al. A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[25] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.