A framework for the development of globally convergent adaptive learning rate algorithms

In this paper we propose a framework for developing globally convergent batch training algorithms with adaptive learning rate. The proposed framework provides conditions under which global convergence is guaranteed for adaptive learning rate training algorithms. To this end, the learning rate is appropriately tuned along the given descent direction. Providing conditions regarding the search direction and the corresponding stepsize length this framework can also guarantee global convergence for training algorithms that use a different learning rate for each weight. To illustrate the effectiveness of the proposed approach on various training algorithms simulation results are provided.

[1]  Elijah Polak,et al.  Optimization: Algorithms and Consistent Approximations , 1997 .

[2]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[3]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization in Large Adaptive Machines , 1992, NIPS.

[4]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[5]  Luís B. Almeida,et al.  Acceleration Techniques for the Backpropagation Algorithm , 1990, EURASIP Workshop.

[6]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.

[7]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[8]  Philip E. Gill,et al.  Practical optimization , 1981 .

[9]  R. D. Murphy,et al.  Iterative solution of nonlinear equations , 1994 .

[10]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[11]  Frank Fallside,et al.  An adaptive training algorithm for back propagation networks , 1987 .

[12]  Jorge Nocedal,et al.  Global Convergence Properties of Conjugate Gradient Methods for Optimization , 1992, SIAM J. Optim..

[13]  P. Wolfe Convergence Conditions for Ascent Methods. II: Some Corrections , 1971 .

[14]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  Thomas P. Vogl,et al.  Rescaling of variables in back propagation learning , 1991, Neural Networks.

[17]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[18]  P. Wolfe Convergence Conditions for Ascent Methods. II , 1969 .

[19]  Jorge Nocedal,et al.  Theory of algorithms for unconstrained optimization , 1992, Acta Numerica.

[20]  Raúl Rojas,et al.  Speeding-up backpropagation-a comparison of orthogonal techniques , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[21]  Sang-Hoon Oh,et al.  An analysis of premature saturation in back propagation learning , 1993, Neural Networks.

[22]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[23]  George D. Magoulas,et al.  Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods , 1999, Neural Computation.

[24]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[25]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[26]  Amir F. Atiya,et al.  An accelerated learning algorithm for multilayer perceptron networks , 1994, IEEE Trans. Neural Networks.

[27]  A. Goldstein Cauchy's method of minimization , 1962 .

[28]  M. N. Vrahatis,et al.  A class of gradient unconstrained minimization algorithms with adaptive stepsize - some corrections , 2000 .

[29]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[30]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .