An Online Backpropagation Algorithm with Validation Error-Based Adaptive Learning Rate

We present a new learning algorithm for feed-forward neural networks based on the standard Backpropagation method using an adaptive global learning rate. The adaption is based on the evolution of the error criteria but in contrast to most other approaches, our method uses the error measured on the validation set instead of the training set to dynamically adjust the global learning rate. At no time the examples of the validation set are directly used for training the network in order to maintain its original purpose of validating the training and to perform "early stopping". The proposed algorithm is a heuristic method consisting of two phases. In the first phase the learning rate is adjusted after each iteration such that a minimum of the error criteria on the validation set is quickly attained. In the second phase, this search is refined by repeatedly reverting to previous weight configurations and decreasing the global learning rate. We experimentally show that the proposed method rapidly converges and that it outperforms standard Backpropagation in terms of generalization when the size of the training set is reduced.

[1]  Ralf Salomon Improved Convergence Rate of Back-Propagation with Dynamic Adaptation of the Learning Rate , 1990, PPSN.

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[4]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[5]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[6]  David Saad,et al.  On-Line Learning in Neural Networks , 1999 .

[7]  George D. Magoulas,et al.  Nonmonotone methods for backpropagation training with adaptive learning rate , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[8]  Jirí Benes,et al.  On neural networks , 1990, Kybernetika.

[9]  Mingui Sun,et al.  An adaptive training algorithm for back-propagation neural networks , 1995, IEEE Trans. Syst. Man Cybern..

[10]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[11]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[12]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[13]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[14]  Mark A. Kramer,et al.  Improvement of the backpropagation algorithm for training neural networks , 1990 .

[15]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[16]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[17]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[18]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[19]  Frank Fallside,et al.  An adaptive training algorithm for back propagation networks , 1987 .

[20]  Mance E. Harmon,et al.  Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[21]  Luís B. Almeida,et al.  Acceleration Techniques for the Backpropagation Algorithm , 1990, EURASIP Workshop.

[22]  George D. Magoulas,et al.  Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods , 1999, Neural Computation.

[23]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[24]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[25]  Christian Wellekens,et al.  Proceedings of the EURASIP Workshop 1990 on Neural Networks , 1990 .

[26]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.