No Free Lunch for Early Stopping

We show that with a uniform prior on models having the same training error, early stopping at some fixed training error above the training error minimum results in an increase in the expected generalization error.

[1]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[2]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[3]  Pierre Baldi,et al.  Temporal Evolution of Generalization during Learning in Linear Networks , 1991, Neural Computation.

[4]  David H. Wolpert,et al.  Mathematics of Generalization: Proceedings: SFI-CNLS Workshop on Formal Approaches to Supervised Learning (1992: Santa Fe, N. M.) , 1995 .

[5]  Robert H. Dodier,et al.  Geometry of Early Stopping in Linear Networks , 1995, NIPS.

[6]  Cyril Goutte,et al.  Note on Free Lunches and Cross-Validation , 1997, Neural Computation.

[7]  David H. Wolpert,et al.  The Existence of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[8]  Dana Angluin,et al.  Learning with hints , 1988, COLT '88.

[9]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[10]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11]  L. Ljung,et al.  Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .

[12]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[13]  David H. Wolpert,et al.  The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[14]  Malik Magdon-Ismail,et al.  No Free Lunch for Noise Prediction , 2000, Neural Computation.

[15]  Huaiyu Zhu,et al.  No Free Lunch for Cross-Validation , 1996, Neural Computation.