Dynamic learning rate optimization of the backpropagation algorithm

It has been observed by many authors that the backpropagation (BP) error surfaces usually consist of a large amount of flat regions as well as extremely steep regions. As such, the BP algorithm with a fixed learning rate will have low efficiency. This paper considers dynamic learning rate optimization of the BP algorithm using derivative information. An efficient method of deriving the first and second derivatives of the objective function with respect to the learning rate is explored, which does not involve explicit calculation of second-order derivatives in weight space, but rather uses the information gathered from the forward and backward propagation, Several learning rate optimization approaches are subsequently established based on linear expansion of the actual outputs and line searches with acceptable descent value and Newton-like methods, respectively. Simultaneous determination of the optimal learning rate and momentum is also introduced by showing the equivalence between the momentum version BP and the conjugate gradient method. Since these approaches are constructed by simple manipulations of the obtained derivatives, the computational and storage burden scale with the network size exactly like the standard BP algorithm, and the convergence of the BP algorithm is accelerated with in a remarkable reduction (typically by factor 10 to 50, depending upon network architectures and applications) in the running time for the overall learning process. Numerous computer simulation results are provided to support the present approaches.

[1]  S. Ragazzini,et al.  Learning of word stress in a sub-optimal second order back-propagation neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[2]  Shidong Li An optimized backpropagation with minimum norm weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  G. E. Kelly,et al.  Supervised learning techniques for backpropagation networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  David Lowe,et al.  A Comparison of Nonlinear Optimisation Strategies for Feed-Forward Adaptive Layered Networks , 1988 .

[5]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.

[6]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[7]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[8]  Lee A. Feldkamp,et al.  Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[9]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[10]  R. Hecht-Nielsen,et al.  Theory of the Back Propagation Neural Network , 1989 .

[11]  D. R. Hush,et al.  Error surfaces for multi-layer perceptrons , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[12]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[13]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[14]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[15]  Xuming Yu,et al.  Training algorithms for backpropagation neural networks with optimal descent factor , 1990 .

[16]  Sharad Singhal,et al.  Training feed-forward networks with the extended Kalman algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[17]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[18]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[19]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .