Efficient Backpropagation Learning Using Optimal Learning Rate and Momentum

Abstract This paper considers efficient backpropagation learning using dynamically optimal learning rate (LR) and momentum factor (MF). A family of approaches exploiting the derivatives with respect to the LR and MF is presented, which does not need to explicitly compute the first two order derivatives in weight space, but rather makes use of the information gathered from the forward and backward procedures. The computational and storage burden for estimating the optimal LR and MF at most triple that of the standard backpropagation algorithm (BPA); however, the backpropagation learning procedure can be accelerated with remarkable savings in running time. Extensive computer simulations provided in this paper indicate that at least a magnitude of savings in running time can be achieved using the present family of approaches. © 1997 Elsevier Science Ltd. All Rights Reserved.

[1]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[2]  Xuming Yu,et al.  Training algorithms for backpropagation neural networks with optimal descent factor , 1990 .

[3]  Daniel C. St. Clair,et al.  Using Taguchi's method of experimental design to control errors in layered perceptrons , 1995, IEEE Trans. Neural Networks.

[4]  G. E. Kelly,et al.  Supervised learning techniques for backpropagation networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[5]  Stefanos Kollias,et al.  An adaptive least squares algorithm for the efficient training of artificial neural networks , 1989 .

[6]  Sharad Singhal,et al.  Training feed-forward networks with the extended Kalman algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[8]  Mohamed Mohandes,et al.  Two adaptive stepsize rules for gradient descent and their application to the training of feedforward artificial neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[9]  Shixin Cheng,et al.  Dynamic learning rate optimization of the backpropagation algorithm , 1995, IEEE Trans. Neural Networks.

[10]  Lee A. Feldkamp,et al.  Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[11]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[12]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.

[13]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[14]  W. Murray Numerical Methods for Unconstrained Optimization , 1975 .

[15]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[16]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[17]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[18]  Guo-An Chen,et al.  Acceleration of backpropagation learning using optimised learning rate and momentum , 1993 .

[19]  David Lowe,et al.  A Comparison of Nonlinear Optimisation Strategies for Feed-Forward Adaptive Layered Networks , 1988 .