Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks

This article deals with nonconvex stochastic optimization problems in deep learning. Appropriate learning rates, based on theory, for adaptive-learning-rate optimization algorithms (e.g., Adam and AMSGrad) to approximate the stationary points of such problems are provided. These rates are shown to allow faster convergence than previously reported for these algorithms. Specifically, the algorithms are examined in numerical experiments on text and image classification and are shown in experiments to perform better with constant learning rates than algorithms using diminishing learning rates.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Ling Shao,et al.  Learning Deep and Wide: A Spectral Method for Learning Deep Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[4]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[5]  Anastasios Tefas,et al.  Training Lightweight Deep Convolutional Neural Networks Using Bag-of-Features Pooling , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[6]  W. Marsden I and J , 2012 .

[7]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[8]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[9]  Luping Shi,et al.  L 1-Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018 .

[10]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[12]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[13]  H. Iiduka Stochastic approximation method using diagonal positive-definite matrices for convex optimization with fixed point constraints , 2021, Fixed Point Theory and Algorithms for Sciences and Engineering.

[14]  Yuan Xie,et al.  $L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[17]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[18]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[19]  Hideaki Iiduka,et al.  Stochastic Fixed Point Optimization Algorithm for Classifier Ensemble , 2020, IEEE Transactions on Cybernetics.

[20]  H. Robbins A Stochastic Approximation Method , 1951 .

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.