Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio
暂无分享,去创建一个
Yoshua Bengio | Amos J. Storkey | Stanislaw Jastrzebski | Asja Fischer | Devansh Arpit | Nicolas Ballas | Zachary Kenton | Yoshua Bengio | A. Storkey | Nicolas Ballas | Stanislaw Jastrzebski | Devansh Arpit | Asja Fischer | Z. Kenton
[1] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[2] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[3] Quoc V. Le,et al. Understanding Generalization and Stochastic Gradient Descent , 2017 .
[4] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[5] Lei Wu,et al. The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent , 2018, ArXiv.
[6] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[7] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[8] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[11] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[12] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[14] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[15] P. Kloeden,et al. Numerical Solution of Stochastic Differential Equations , 1992 .
[16] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[17] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[18] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[19] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[20] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[21] Jian-Guo Liu,et al. Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent , 2017, ArXiv.