London WC1E6BT

This paper concerns the use of real-valued functions for binary classification problems. Previous work in this area has concentrated on using as an error estimate the ‘resubstitution’ error (that is, the empirical error of a classifier on the training sample) or its derivatives. However, in practice, cross-validation and related techniques are more popular. Here, we devise new holdout and cross-validation estimators for the case where real-valued functions are used as classifiers, and we analyse theoretically the accuracy of these.

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[9]  Gerald Tesauro,et al.  How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[12]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[13]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[14]  Mahesan Niranjan,et al.  On the Practical Applicability of VC Dimension Bounds , 1995, Neural Computation.

[15]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[16]  Sean B. Holden PAC-like upper bounds for the sample complexity of leave-one-out cross-validation , 1996, COLT '96.

[17]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[18]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[19]  Martin Anthony Probabilistic ‘generalization’ of functions and dimension-based uniform convergence results , 1998, Stat. Comput..

[20]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[21]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[22]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[23]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[24]  Peter L. Bartlett,et al.  Function Learning from Interpolation , 1995, Combinatorics, Probability and Computing.