暂无分享,去创建一个
Michael Möller | Daniel Cremers | Thomas Frerix | Thomas Möllenhoff | D. Cremers | Thomas Frerix | Michael Möller | T. Möllenhoff
[1] eon BottouAT. Stochastic Gradient Learning in Neural Networks , 2022 .
[2] H. Robbins. A Stochastic Approximation Method , 1951 .
[3] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[4] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .
[5] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[6] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[7] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[8] Stefano Soatto,et al. Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.
[9] Surya Ganguli,et al. Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods , 2013, ICML.
[10] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[11] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[12] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[13] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[14] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[15] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[18] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[21] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[22] B. Martinet. Brève communication. Régularisation d'inéquations variationnelles par approximations successives , 1970 .
[23] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[24] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[25] Stephen J. Wright,et al. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .