暂无分享,去创建一个
[1] Geoffrey E. Hinton,et al. Learning distributed representations of concepts. , 1989 .
[2] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[3] Ethan Dyer,et al. Asymptotics of Wide Networks from Feynman Diagrams , 2019, ICLR.
[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[5] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[6] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[7] Guodong Zhang,et al. Three Mechanisms of Weight Decay Regularization , 2018, ICLR.
[8] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[9] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[10] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.
[11] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[12] Yoshua Bengio,et al. On the Spectral Bias of Neural Networks , 2018, ICML.
[13] Elad Hoffer,et al. Norm matters: efficient and accurate normalization schemes in deep networks , 2018, NeurIPS.
[14] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[15] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[16] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[17] Jaehoon Lee,et al. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.
[18] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[19] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[20] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[21] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.