Effective Backpropagation Training with Variable Stepsize

The issue of variable stepsize in the backpropagation training algorithm has been widely investigated and several techniques employing heuristic factors have been suggested to improve training time and reduce convergence to local minima. In this contribution, backpropagation training is based on a modified steepest descent method which allows variable stepsize. It is computationally efficient and posseses interesting convergence properties utilizing estimates of the Lipschitz constant without any additional computational cost. The algorithm has been implemented and tested on several problems and the results have been very satisfactory. Numerical evidence shows that the method is robust with good average performance on many classes of problems. Copyright 1996 Elsevier Science Ltd.

[1]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[2]  Martin G. Bello,et al.  Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks , 1992, IEEE Trans. Neural Networks.

[3]  Alex Pentland,et al.  Fractal-Based Description of Natural Scenes , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[5]  Frank Fallside,et al.  An adaptive training algorithm for back propagation networks , 1987 .

[6]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[7]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Hiroshi Yamamoto,et al.  Reduction of required precision bits for back-propagation applied to pattern recognition , 1993, IEEE Trans. Neural Networks.

[9]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[10]  Michael K. Weir,et al.  A method for self-determination of adaptive learning rates in back propagation , 1991, Neural Networks.

[11]  Stefanos Kollias,et al.  An adaptive least squares algorithm for the efficient training of artificial neural networks , 1989 .

[12]  Edward A. Rietman,et al.  Back-propagation learning and nonidealities in analog neural network hardware , 1991, IEEE Trans. Neural Networks.

[13]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[14]  Luís B. Almeida,et al.  Acceleration Techniques for the Backpropagation Algorithm , 1990, EURASIP Workshop.

[15]  Phil Brodatz,et al.  Textures: A Photographic Album for Artists and Designers , 1966 .

[16]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[17]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[18]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[19]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[20]  A. Goldstein Cauchy's method of minimization , 1962 .

[21]  Sang-Hoon Oh,et al.  An analysis of premature saturation in back propagation learning , 1993, Neural Networks.

[22]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[23]  Alessandro Sperduti,et al.  Speed up learning and network optimization with extended back propagation , 1993, Neural Networks.

[24]  R.M. Haralick,et al.  Statistical and structural approaches to texture , 1979, Proceedings of the IEEE.

[25]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.