论文信息 - London WC1E6BT

London WC1E6BT

This paper concerns the use of real-valued functions for binary classification problems. Previous work in this area has concentrated on using as an error estimate the ‘resubstitution’ error (that is, the empirical error of a classifier on the training sample) or its derivatives. However, in practice, cross-validation and related techniques are more popular. Here, we devise new holdout and cross-validation estimators for the case where real-valued functions are used as classifiers, and we analyse theoretically the accuracy of these.

Martin Anthony | Sean B. Holden | M. Anthony | S. Holden

[1] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory A.

[4] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5] Godfried T. Toussaint,et al. Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[6] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[7] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[8] Martin Anthony,et al. Computational learning theory: an introduction , 1992 .

[9] Gerald Tesauro,et al. How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[10] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11] John Shawe-Taylor,et al. A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[12] D. M. Titterington,et al. Neural Networks: A Review from a Statistical Perspective , 1994 .

[13] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[14] Mahesan Niranjan,et al. On the Practical Applicability of VC Dimension Bounds , 1995, Neural Computation.

[15] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[16] Sean B. Holden. PAC-like upper bounds for the sample complexity of leave-one-out cross-validation , 1996, COLT '96.

[17] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[18] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[19] Martin Anthony. Probabilistic ‘generalization’ of functions and dimension-based uniform convergence results , 1998, Stat. Comput..

[20] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[21] Dana Ron,et al. Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[22] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .

[23] John Langford,et al. Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[24] Peter L. Bartlett,et al. Function Learning from Interpolation , 1995, Combinatorics, Probability and Computing.