暂无分享,去创建一个
Guang Cheng | Wenjia Wang | Tianyang Hu | Cong Lin | Guang Cheng | Wenjia Wang | Tianyang Hu | Cong Lin
[1] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[2] R. Varga. Geršgorin And His Circles , 2004 .
[3] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[5] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[6] S. Geer. On the uniform convergence of empirical norms and inner products, with application to causal inference , 2013, 1310.5523.
[7] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[8] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[9] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[10] W. W. Daniel. Applied Nonparametric Statistics , 1979 .
[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[12] Tri Dao,et al. A Kernel Theory of Modern Data Augmentation , 2018, ICML.
[13] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[14] R. Basri,et al. On the Similarity between the Laplace and Neural Tangent Kernels , 2020, NeurIPS.
[15] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[16] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[17] Taiji Suzuki,et al. Refined Generalization Analysis of Gradient Descent for Over-parameterized Two-layer Neural Networks with Smooth Activations on Classification Problems , 2019, ArXiv.
[18] Yuan Cao,et al. Towards Understanding the Spectral Bias of Deep Learning , 2021, IJCAI.
[19] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.
[20] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.
[21] Ekachai Phaisangittisagul,et al. An Analysis of the Regularization Between L2 and Dropout in Single Hidden Layer Neural Network , 2016, 2016 7th International Conference on Intelligent Systems, Modelling and Simulation (ISMS).
[22] Jing Wang,et al. Entropy numbers of Besov classes of generalized smoothness on the sphere , 2014 .
[23] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[24] Lin Chen,et al. Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS , 2020, ICLR.
[25] Adam Krzyzak,et al. Over-parametrized deep neural networks do not generalize well , 2019 .
[26] S. Geer. Empirical Processes in M-Estimation , 2000 .
[27] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[28] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[29] C. Frye,et al. Spherical Harmonics in p Dimensions , 2012, 1205.3548.
[30] Quanquan Gu,et al. An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.
[31] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .
[32] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[33] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[34] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[35] Atsushi Nitanda,et al. Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime , 2021, ICLR.
[36] Ruiqi Liu,et al. Optimal Nonparametric Inference via Deep Neural Network , 2019, ArXiv.
[37] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[38] Ming Yuan,et al. Minimax Optimal Rates of Estimation in High Dimensional Additive Models: Universal Phase Transition , 2015, ArXiv.
[39] Kawin Setsompop,et al. Fast image reconstruction with L2‐regularization , 2013, Journal of magnetic resonance imaging : JMRI.
[40] Mountain View,et al. On the training dynamics of deep networks with L2 regularization , 2020 .
[41] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[42] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[43] S. Shott,et al. Nonparametric Statistics , 2018, The Encyclopedia of Archaeological Sciences.
[44] Julien Mairal,et al. On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.
[45] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[46] Lutz Prechelt,et al. Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.
[47] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[48] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[49] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] K. Atkinson,et al. Spherical Harmonics and Approximations on the Unit Sphere: An Introduction , 2012 .
[51] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[52] Martin J. Wainwright,et al. Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[53] M. Kohler,et al. On deep learning as a remedy for the curse of dimensionality in nonparametric regression , 2019, The Annals of Statistics.
[54] Zhiyuan Li,et al. Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee , 2019, ICLR.
[55] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[56] Kenji Fukumizu,et al. Deep Neural Networks Learn Non-Smooth Functions Effectively , 2018, AISTATS.
[57] V. G. Troitsky,et al. Journal of Mathematical Analysis and Applications , 1960 .
[58] J. Dick,et al. A Characterization of Sobolev Spaces on the Sphere and an Extension of Stolarsky’s Invariance Principle to Arbitrary Smoothness , 2012, 1203.5157.