An Adaptive Optimization Method Based on Learning Rate Schedule for Neural Networks
暂无分享,去创建一个
Dokkyun Yi | Sangmin Ji | Jieun Park | Dokkyun Yi | Jieun Park | Sangmin Ji
[1] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[2] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[3] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[4] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[5] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.
[6] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[7] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[8] C. Kelley. Iterative Methods for Linear and Nonlinear Equations , 1987 .
[9] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[10] Sham M. Kakade,et al. The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.
[11] Carl Tim Kelley,et al. Iterative methods for optimization , 1999, Frontiers in applied mathematics.
[12] Dokkyun Yi,et al. An Effective Optimization Method for Machine Learning Based on ADAM , 2020, Applied Sciences.
[13] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.