Improving neural network training solutions using regularisation

Abstract This paper describes the application of regularisation to the training of feedforward neural networks, as a means of improving the quality of solutions obtained. The basic principles of regularisation theory are outlined for both linear and nonlinear training and then extended to cover a new hybrid training algorithm for feedforward neural networks recently proposed by the authors. The concept of functional regularisation is also introduced and discussed in relation to MLP and RBF networks. The tendency for the hybrid training algorithm and many linear optimisation strategies to generate large magnitude weight solutions when applied to ill-conditioned neural paradigms is illustrated graphically and reasoned analytically. While such weight solutions do not generally result in poor fits, it is argued that they could be subject to numerical instability and are therefore undesirable. Using an illustrative example it is shown that, as well as being beneficial from a generalisation perspective, regularisation also provides a means for controlling the magnitude of solutions.

[1]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[2]  Mark J. L. Orr,et al.  Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[3]  Philip E. Gill,et al.  Practical optimization , 1981 .

[4]  Cheng-Liang Chen,et al.  Hybrid learning algorithm for Gaussian potential function networks , 1993 .

[5]  George W. Irwin,et al.  Fast Gradient Based Off-Line Training of Multilayer Perceptrons , 1995 .

[6]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[7]  J. Mark Introduction to radial basis function networks , 1996 .

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  J. O. Rawlings,et al.  Applied Regression Analysis , 1998 .

[10]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[11]  Martin Brown,et al.  Intelligent data modelling using neurofuzzy algorithms , 1997 .

[12]  J. Si,et al.  The best approximation properties and error bounds of Gaussian networks , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[13]  Dobrivoje Popovic,et al.  Oscillation-Resisting in the Learning of Backpropagation Neural Networks , 1995 .

[14]  George W. Irwin,et al.  A hybrid linear/nonlinear training algorithm for feedforward neural networks , 1998, IEEE Trans. Neural Networks.

[15]  George W. Irwin,et al.  Fast parallel off-line training of multilayer perceptrons , 1997, IEEE Trans. Neural Networks.

[16]  J. Sjöberg Non-Linear System Identification with Neural Networks , 1995 .

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  J. Andrew Ware,et al.  Layered Neural Networks as Universal Approximators , 1997, Fuzzy Days.

[19]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[20]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[21]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[22]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[23]  David Lowe,et al.  A Comparison of Nonlinear Optimisation Strategies for Feed-Forward Adaptive Layered Networks , 1988 .

[24]  Roberto Battiti,et al.  BFGS Optimization for Faster and Automated Supervised Learning , 1990 .

[25]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[26]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[27]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[28]  Hong Chen,et al.  Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks , 1993, IEEE Trans. Neural Networks.

[29]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[30]  Kevin Martin Bossley,et al.  Neurofuzzy modelling approaches in system identification , 1997 .

[31]  C. M. Bishop,et al.  Curvature-Driven Smoothing in Backpropagation Neural Networks , 1992 .

[32]  CentresMark,et al.  Regularisation in the Selection of Radial Basis Function , 1995 .

[33]  George W. Irwin,et al.  Nonlinear Optimization of Rbf Networks , 1998, Int. J. Syst. Sci..

[34]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[35]  L. Ljung,et al.  Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .