A Non-convergent On-Line Training Algorithm for Neural Networks

Stopped training is a method to avoid over-fitting of neural network models by preventing an iterative optimization method from reaching a local minimum of the objective function. It is motivated by the observation that over-fitting occurs gradually as training progresses. The stopping time is typically determined by monitoring the expected generalization performance of the model as approximated by the error on a validation set. In this paper we propose to use an analytic estimate for this purpose. However, these estimates require knowledge of the analytic form of the objective function used for training the network and are only applicable when the weights correspond to a local minimum of this objective function. For this reason, we propose the use of an auxiliary, regularized objective function. The algorithm is “self-contained” and does not require to split the data in a training and a separate validation set.

[1]  L. Ljung,et al.  Overtraining, Regularization, and Searching for Minimum in Neural Networks , 1992 .

[2]  Yves Chauvin Generalization Dynamics in LMS Trained Linear Networks , 1990, NIPS.

[3]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[4]  M. Stone Cross-validation:a review 2 , 1978 .

[5]  Jan Larsen,et al.  A generalization error estimate for nonlinear systems , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[6]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[7]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[8]  D. Rumelhart,et al.  The effective dimension of the space of hidden units , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[9]  Alan J. Gross,et al.  Self-Organizing Methods in Modeling , 1988 .

[10]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11]  G. Orr,et al.  Weight-space probability densities and convergence times for stochastic learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.