Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation

When using MLP regression models, some method for estimating the generalisation ability is required to identify badly over and underfitted models. If data is limited, it may be impossible to spare sufficient data for a test set, and leave-one-out crossvalidation may be considered as an alternative method for estimating generalisation ability. However, this method is very computer intensive, and we suggest a faster, approximate version suitable for use with the MLP. This approximate method is tested using an artificial test problem, and is then applied to a real modelling problem from the papermaking industry. It is shown that the basic method appears to work quite well, but that the approximation may be poor under certain conditions. These conditions and possible means of improving the approximation are discussed in some detail.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  R. Dennis Cook,et al.  Leverage and Superleverage in Nonlinear Regression , 1992 .

[3]  Tim Dunne,et al.  A note on the relationship between parameter collinearity and local influence , 1992 .

[4]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[5]  Stephen Jose Hanson,et al.  Minkowski-r Back-Propagation: Learning in Connectionist Models with Non-Euclidian Error Signals , 1987, NIPS.

[6]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[7]  A. Bulsari,et al.  Application of feed-forward neural networks for system identification of a biochemical process , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[8]  J. S. Urban Hjorth,et al.  Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap , 1993 .

[9]  D. W. Scott,et al.  The L 1 Method for Robust Nonparametric Regression , 1994 .

[10]  David A. Belsley,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1991 .

[11]  W. H. Ross,et al.  The geometry of case deletion and the assessment of influence in nonlinear regression , 1987 .

[12]  Chong Gu,et al.  Structured Machine Learning for Soft Classification with Smoothing Spline ANOVA and Stacked Tuning, Testing, and Evaluation , 1993, NIPS.

[13]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[14]  C. L. Mallows Some comments on C_p , 1973 .

[15]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[16]  M. Stone Asymptotics for and against cross-validation , 1977 .

[17]  Wolfgang Härdle,et al.  Applied Nonparametric Regression , 1991 .

[18]  G. Stewart [Collinearity and Least Squares Regression]: Rejoinder , 1987 .

[19]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[20]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[21]  Ronald A. Thisted,et al.  Elements of statistical computing , 1986 .

[22]  Wolfgang Härdle Applied Nonparametric Regression: Data sets with outliers , 1990 .

[23]  G. Golub,et al.  Good Ridge Parameter , 1979 .

[24]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[25]  P. J. Huber Robust Statistical Procedures , 1977 .

[26]  Brian Everitt An introduction to optimization methods , 1987 .

[27]  Robert Schall,et al.  Diagnostics for nonlinear L p , 1991 .

[28]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[29]  Yong Liu,et al.  Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method , 1992, NIPS.

[30]  N.V. Bhat,et al.  Modeling chemical process systems via neural computation , 1990, IEEE Control Systems Magazine.

[31]  D. G. Watts,et al.  Accounting for Intrinsic Nonlinearity in Nonlinear Regression Parameter Inference Regions , 1982 .

[32]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[33]  A. Money,et al.  Nonlinear Lp-Norm Estimation , 2020 .

[34]  David C. Hoaglin,et al.  Leverage in Least Squares Additive-Plus-Multiplicative Fits for Two-Way Tables , 1984 .

[35]  Robert Tibshirani,et al.  An Introduction to the Bootstrap.@@@Computer-Intensive Statistical Methods: Validation Model Selection and Bookstrap. , 1994 .

[36]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[37]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[38]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[39]  Donald W. Marquaridt Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation , 1970 .

[40]  M. C. Jones,et al.  Spline Smoothing and Nonparametric Regression. , 1989 .

[41]  James S. J. Lee,et al.  A new LMS-based algorithm for rapid adaptive classification in dynamic environments: theory and preliminary results , 1988, IEEE 1988 International Conference on Neural Networks.

[42]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[43]  Alan B. Forsythe,et al.  Robust Estimation of Straight Line Regression Coefficients by Minimizing pth Power Deviations , 1972 .

[44]  S. D. Toit,et al.  Numerical algorithms for solving nonlinear L р-norm estimation problems: part II - a mixture method for large residual and illo-conditioned problems , 1987 .

[45]  Ah Chung Tsoi,et al.  Application of Neural Network Methodology to the Modelling of the Yield Strength in a Steel Rolling Plate Mill , 1991, NIPS.

[46]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[47]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[48]  W. Härdle Applied Nonparametric Regression , 1991 .

[49]  Hrishikesh D. Vinod,et al.  Recent Advances in Regression Methods. , 1983 .

[50]  Jeffrey S. Simonoff,et al.  Jackknife-based estimators and confidence regions in nonlinear regression , 1986 .

[51]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[52]  G. Stewart Collinearity and Least Squares Regression , 1987 .

[53]  Bernard W. Silverman,et al.  A Fast and Efficient Cross-Validation Method for Smoothing Parameter Choice in Spline Regression , 1984 .

[54]  H. White,et al.  Cross-Validation Estimates IMSE , 1993, NIPS 1993.

[55]  D. Hinkley,et al.  Jackknifing in Nonlinear Regression , 1980 .

[56]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[57]  D. Ruppert,et al.  Diagnostics and robust estimation when tranforming the regression model and the response , 1987 .

[58]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[59]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[60]  R. Dennis Cook,et al.  Leverage, local influence and curvature in nonlinear regression , 1993 .

[61]  H. Ekblom Lp-methods for robust regression , 1974 .

[62]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[63]  A. Gallant,et al.  Nonlinear Statistical Models , 1988 .

[64]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[65]  C. Mallows More comments on C p , 1995 .

[66]  David Barber,et al.  Test Error Fluctuations in Finite Linear Perceptrons , 1995, Neural Computation.

[67]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[68]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[69]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[70]  George T. Duncan,et al.  An Empirical Study of Jackknife-Constructed Confidence Regions in Nonlinear Regression , 1978 .

[71]  D. M. Titterington,et al.  [Neural Networks: A Review from Statistical Perspective]: Rejoinder , 1994 .

[72]  G. Karsai,et al.  Artificial neural networks applied to arc welding process modeling and control , 1989, Conference Record of the IEEE Industry Applications Society Annual Meeting,.

[73]  S. Geman,et al.  Consistent Cross-Validated Density Estimation , 1983 .

[74]  P. Green Penalized Likelihood for General Semi-Parametric Regression Models. , 1987 .

[75]  Gene H. Golub,et al.  Matrix computations , 1983 .

[76]  R. Gonin Numerical algorithms for solving nonlinear l -norm estimation problems: Part i - a first-order gradient algorithm for well-conditioned small residual problems , 1986 .

[77]  G. Wahba,et al.  Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation , 1980 .

[78]  Yong Liu,et al.  Unbiased estimate of generalization error and model selection in neural network , 1995, Neural Networks.

[79]  Abhay B. Bulsari,et al.  System identification of a biochemical process using feed-forward neural networks , 1991, Neurocomputing.

[80]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .