Local Overfitting Control via Leverages

We present a novel approach to dealing with overfitting in black box models. It is based on the leverages of the samples, that is, on the influence that each observation has on the parameters of the model. Since overfitting is the consequence of the model specializing on specific data points during training, we present a selection method for nonlinear models based on the estimation of leverages and confidence intervals. It allows both the selection among various models of equivalent complexities corresponding to different minima of the cost function (e.g., neural nets with the same number of hidden units) and the selection among models having different complexities (e.g., neural nets with different numbers of hidden units). A complete model selection methodology is derived.

[1]  Ulrich Anders,et al.  Model selection in neural networks , 1999, Neural Networks.

[2]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[3]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-one-Out Cross-Validation , 1997, COLT.

[4]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[5]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[6]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[7]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[8]  Gaetan Monari Sélection de modèles non linéaires par "leave-one-out": étude théorique et application des réseaux de neurones au procédé de soudage par points , 1999 .

[9]  Léon Personnaz,et al.  Jacobian Conditioning Analysis for Model Validation , 2004, Neural Computation.

[10]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[11]  Gérard Dreyfus,et al.  A machine learning approach to the estimation of the liquidus temperature of glass-forming oxide blends , 2003 .

[12]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[13]  Jennie Si,et al.  A Systematic and Effective Supervised Learning Mechanism Based on Jacobian Rank Deficiency , 1998, Neural Computation.

[14]  Gérard Dreyfus,et al.  Withdrawing an example from the training set: An analytic estimation of its effect on a non-linear parameterised model , 2000, Neurocomputing.

[15]  Léon Personnaz,et al.  A statistical procedure for determining the optimal number of hidden neurons of a neural model , 2000 .

[16]  Harry Wechsler,et al.  From Statistics to Neural Networks: Theory and Pattern Recognition Applications , 1996 .

[17]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[18]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[19]  L. Ljung,et al.  Overtraining, Regularization, and Searching for Minimum in Neural Networks , 1992 .