Automatic learning rate optimization by higher-order derivatives

Automatic optimization of learning rate is a central issue to improving the efficiency and applicability of backpropagation learning. In this paper techniques have been investigated, which explore the first four derivatives of the learning rate of backpropagation error surface. The derivatives are derived from an extended feedforward propagation procedure and can be calculated in an iterative manner. The near-optimal dynamic learning rate is obtained with only a moderate increase in computational complexity at each iteration, scaling like the plain backpropagation algorithm (BPA), but the proposed method achieves rapid convergence and very significant gains in running time savings to at least an order of magnitude as compared with the BPA.

[1]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[2]  Xuming Yu,et al.  Training algorithms for backpropagation neural networks with optimal descent factor , 1990 .

[3]  Shixin Cheng,et al.  Dynamic learning rate optimization of the backpropagation algorithm , 1995, IEEE Trans. Neural Networks.

[4]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[5]  David Lowe,et al.  A Comparison of Nonlinear Optimisation Strategies for Feed-Forward Adaptive Layered Networks , 1988 .

[6]  Bernard Widrow,et al.  Adaptive Signal Processing , 1985 .

[7]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[8]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.

[9]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[10]  Sharad Singhal,et al.  Training feed-forward networks with the extended Kalman algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[12]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[13]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[14]  Lee A. Feldkamp,et al.  Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[15]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .