Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting
暂无分享,去创建一个
[1] Anthony C. C. Coolen,et al. Statistical mechanical analysis of the dynamics of learning in perceptrons , 1997, Stat. Comput..
[2] S. Geer,et al. Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.
[3] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[4] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .
[5] Raman Arora,et al. Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..
[6] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[7] Yuandong Tian,et al. A theoretical framework for deep locally connected ReLU network , 2018, ArXiv.
[8] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[9] Yuandong Tian. Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network , 2019, ArXiv.
[10] Yi Lin,et al. Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.
[11] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[12] R. Srikant,et al. Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.
[13] M. Kohler,et al. On deep learning as a remedy for the curse of dimensionality in nonparametric regression , 2019, The Annals of Statistics.
[14] Sridha Sridharan,et al. Iris Recognition With Off-the-Shelf CNN Features: A Deep Learning Perspective , 2018, IEEE Access.
[15] Yuan Cao,et al. Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks , 2019, NeurIPS.
[16] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[18] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.
[19] Christian Tjandraatmadja,et al. Bounding and Counting Linear Regions of Deep Neural Networks , 2017, ICML.
[20] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[21] Michael Kohler,et al. On the rate of convergence of fully connected very deep neural network regression estimates , 2019, The Annals of Statistics.
[22] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[23] Tuo Zhao,et al. Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.
[24] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[25] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..
[26] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .
[27] Ruiqi Liu,et al. Optimal Nonparametric Inference via Deep Neural Network , 2019, ArXiv.
[28] Kenji Fukumizu,et al. Deep Neural Networks Learn Non-Smooth Functions Effectively , 2018, AISTATS.
[29] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[30] Taiji Suzuki,et al. Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks , 2019, ICML.
[31] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[32] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[33] Taiji Suzuki,et al. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.
[34] Masaaki Imaizumi,et al. Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.
[35] Yongdai Kim,et al. Fast convergence rates of deep neural networks for classification , 2018, Neural Networks.
[36] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[37] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.
[38] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[39] A. Tsybakov,et al. Minimax theory of image reconstruction , 1993 .
[40] Guang Cheng,et al. Rate Optimal Variational Bayesian Inference for Sparse DNN , 2019, 1910.04355.
[41] Dmitry Yarotsky,et al. The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.
[42] Nicolas Macris,et al. The committee machine: computational to statistical gaps in learning a two-layers neural network , 2018, NeurIPS.
[43] Ingo Steinwart,et al. Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.
[44] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[45] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Florent Krzakala,et al. Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup , 2019, NeurIPS.
[48] A. Tsybakov,et al. Fast learning rates for plug-in classifiers , 2007, 0708.2321.
[49] David Saad,et al. Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.
[50] Sanjog Misra,et al. Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands , 2018, Econometrica.
[51] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..