A Cyclical Learning Rate Method in Deep Learning Training

The learning rate is an important hyperparameter for training deep neural networks. The traditional learning rate method has the problems of instability of accuracy. Aiming at these problems, we proposed a new learning rate method with different cyclical changes in each training cycle instead of a fixed value. It achieves higher accuracy in less iterations and faster convergence. Through the experiment on CIFAR-10 and CIFAR-100 datasets based on VGG network and RESNET network, the final results show that the proposed method has better results on stability and accuracy than cyclical learning rate method.

[1]  Tatiana Tommasi,et al.  Training Deep Networks without Learning Rates Through Coin Betting , 2017, NIPS.

[2]  Georg Heigold,et al.  An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[4]  Razvan Pascanu,et al.  Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[5]  Bin Yu,et al.  Stability and Convergence Trade-off of Iterative Optimization Algorithms , 2018, ArXiv.

[6]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[7]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[8]  Yoshua Bengio,et al.  ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient , 2014, ArXiv.

[9]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10]  Harm de Vries,et al.  RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[11]  Jun Yang,et al.  Auto-Ensemble: An Adaptive Learning Rate Scheduling Based Deep Learning Model Ensembling , 2020, IEEE Access.

[12]  Yoshua Bengio,et al.  Three Factors Influencing Minima in SGD , 2017, ArXiv.

[13]  Warren B. Powell,et al.  Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.

[14]  Nathan Srebro,et al.  Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).