SGD S MOOTHS THE S HARPEST D IRECTIONS
暂无分享,去创建一个
Yoshua Bengio | A. Storkey | Nicolas Ballas | Zachary Kenton | Asja Fischer | Stanisław Jastrzębski | Z. Kenton
[1] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[2] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[3] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[6] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[7] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[8] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .
[9] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[10] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[11] eon BottouAT. Stochastic Gradient Learning in Neural Networks , 2022 .
[12] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[14] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.