Test error bounds for classifiers: A survey of old and new results

In this paper, we focus the attention on one of the oldest problems in pattern recognition and machine learning: the estimation of the generalization error of a classifier through a test set. Despite this problem has been addressed for several decades, the last word has not yet been written, as new proposals continue to appear in the literature. Our objective is to survey and compare old and new techniques, in terms of quality of the estimation, easiness of use, and rigorousness of the approach.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[3]  V. Bentkus On Hoeffding’s inequalities , 2004, math/0410159.

[4]  Tan Yee Fan,et al.  A Tutorial on Support Vector Machine , 2009 .

[5]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[6]  Davide Anguita,et al.  K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines , 2009, DMIN.

[7]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[10]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11]  Martin Anthony,et al.  Cross-validation for binary classification by real-valued functions: theoretical analysis , 1998, COLT' 98.

[12]  Ambuj Tewari,et al.  Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results , 2007, J. Mach. Learn. Res..

[13]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[14]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[15]  V. Bentkus An Inequality for Large Deviation Probabilities of Sums of Bounded i.i.d. Random Variables , 2001 .

[16]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[17]  Olivier Gascuel,et al.  Distribution-free performance bounds with the resubstitution error estimate , 1992, Pattern Recognit. Lett..

[18]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[19]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[20]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[21]  Davide Anguita,et al.  Theoretical and Practical Model Selection Methods for Support Vector Classifiers , 2004 .

[22]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[23]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[24]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[25]  Louis Guttman,et al.  A Distribution-Free Confidence Interval for the Mean , 1948 .