A Study of Early Stopping and Model Selection Applied to the Papermaking Industry

This paper addresses the issues of neural network model development and maintenance in the context of a complex task taken from the papermaking industry. In particular, it describes a comparison study of early stopping techniques and model selection, both to optimise neural network models for generalisation performance. The results presented here show that early stopping via use of a Bayesian model evidence measure is a viable way of optimising performance while also making maximum use of all the data. In addition, they show that ten-fold cross-validation performs well as a model selector and as an estimator of prediction accuracy. These results are important in that they show how neural network models may be optimally trained and selected for highly complex industrial tasks where the data are noisy and limited in number.

[1]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[6]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[7]  Alan F. Murray,et al.  Paper curl prediction - neural networks applied to the papermaking industry , 1999 .

[8]  Domenico Perrotta To stop learning using the evidence , 1998, ESANN.

[9]  Alan F. Murray,et al.  Toward Optimally Distributed Computation , 1998, Neural Computation.

[10]  Andrew J. Myles Methods for addressing some practical issues in MLP regression and their application to modelling curl in papermaking , 1997 .

[11]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[12]  C. L. Mallows Some comments on C_p , 1973 .

[13]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[14]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[15]  H. H. Thodberg Ace of Bayes : Application of Neural , 1993 .

[16]  Alan F. Murray,et al.  The application of neural networks to the papermaking industry , 1999, IEEE Trans. Neural Networks.

[17]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[18]  M. Stone Cross-validation:a review 2 , 1978 .

[19]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[20]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[21]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[22]  M. B. Lyne Paper requirements for impact and non-impact printers , 1988 .

[23]  Robert Tibshirani,et al.  An Introduction to the Bootstrap CHAPMAN & HALL/CRC , 1993 .