Ten More Years of Error Rate Research

The assessment of the performance of supervised classification rules by estimating their error rate (the proportion of objects misclassified) is an important area of work in statistical pattern recognition. This paper reviews the last ten years of error rate research, bringing up to date the reviews of Hand (1986a) and McLachlan (1987). Since those surveys were published, old estimators have been improved new estimators have been introduced, and new approaches to error rate estimation have been developed. Some of this work has led to deep insights into classification methodology and statistical modelling in general.

[1]  Gerhard E. Tutz,et al.  Smoothed additive estimators for non-error rates in multiple discriminant analysis , 1985, Pattern Recognit..

[2]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[3]  Josef Kittler,et al.  Statistical Properties of Error Estimators in Performance Assessment of Recognition Systems , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Keinosuke Fukunaga,et al.  Bayes Error Estimation Using Parzen and k-NN Procedures , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rutter carolyn,et al.  Bias in error rate estimates in discriminant analysis when stepwise variable selection is employed , 1991 .

[6]  G. McLachlan Error Rate Estimation in Discriminant Analysis: Recent Advances , 1987 .

[7]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Alvin C. Rencher,et al.  BIAS in apparent classification rates in stepwise discriminant analysis , 1992 .

[10]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[11]  Sholom M. Weiss,et al.  Small Sample Error Rate Estimation for k-NN Classifiers , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Carsten Lanquillon,et al.  Evaluating Usefulness for Dynamic Classification , 1998, KDD.

[13]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David J. Hand,et al.  Error rate estimation by mixture decomposition , 1987 .

[15]  Gábor Lugosi,et al.  Strong minimax lower bounds for learning , 1996, COLT '96.

[16]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[17]  D. J. Hand An optimal error rate estimator based on average conditional error rate: Asymptotic results , 1986, Pattern Recognit. Lett..

[18]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[19]  John Shawe-Taylor Sample sizes for sigmoidal neural networks , 1995, COLT '95.

[20]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[21]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[22]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[23]  Wojtek J. Krzanowski,et al.  Error-rate estimation in two-group discriminant analysis using the linear discriminant function , 1990 .

[24]  David A. Cohn,et al.  Can Neural Networks Do Better Than the Vapnik-Chervonenkis Bounds? , 1990, NIPS.

[25]  David Hirst Error-rate estimation in multiple-group linear discriminant analysis , 1996 .

[26]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[27]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[28]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[29]  D. J. Hand,et al.  Recent advances in error rate estimation , 1986, Pattern Recognit. Lett..

[30]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[31]  Dale Schuurmans Characterizing Rational Versus Exponential learning Curves , 1997, J. Comput. Syst. Sci..

[32]  J D Knoke,et al.  Estimation of error rates in discriminant analysis with selection of variables. , 1989, Biometrics.

[33]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[34]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[35]  Qiuming Zhu On the minimum probability of error of classification with incomplete patterns , 1990, Pattern Recognit..

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[38]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[39]  Charles Elkan,et al.  Estimating the Accuracy of Learned Concepts , 1993, IJCAI.

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  Keinosuke Fukunaga,et al.  Nonparametric Bayes error estimation using unclassified samples , 1972, IEEE Trans. Inf. Theory.

[42]  Gerald Tesauro,et al.  How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[43]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[44]  David J. Hand,et al.  A Monte Carlo study of the 632 bootstrap estimator of error rate , 1991 .

[45]  David J. Hand,et al.  ASSESSING ERROR RATE ESTIMATORS: THE LEAVE-ONE-OUT METHOD RECONSIDERED , 1997 .

[46]  Gail Gong Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression , 1986 .

[47]  Alberto Ruiz A nonparametric bound for the bayes error , 1995, Pattern Recognit..

[48]  Keinosuke Fukunaga,et al.  Estimation of Classifier Performance , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[50]  Sarunas Raudys On the accuracy of a bootstrap estimate of the classification error , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[51]  D. J. Hand,et al.  A comparison of two average conditional error rate estimators , 1987, Pattern Recognit. Lett..

[52]  R. Nakano,et al.  Estimating expected error rates of neural network classifiers in small sample size situations: a comparison of cross-validation and bootstrap , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[53]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[54]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[55]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[56]  G. J. McLachlan,et al.  On the Relationship between the F Test and the Overall Error Rate for Variable Selection in Two-Group Discriminant Analysis , 1980 .

[57]  Gábor Lugosi,et al.  On the posterior-probability estimate of the error rate of nonparametric classification rules , 1993, IEEE Trans. Inf. Theory.

[58]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[59]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[60]  David J. Hand A shrunken leaving-one-out estimator of error rate , 1987 .

[61]  Mahesan Niranjan,et al.  On the Practical Applicability of VC Dimension Bounds , 1995, Neural Computation.

[62]  C. A. Smith Some examples of discrimination. , 1947, Annals of eugenics.

[63]  James D. Knoke,et al.  Bootstrapped and smoothed classification error rate estimators , 1988 .

[64]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[65]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[66]  Ljubomir J. Buturovic,et al.  Improving k-nearest neighbor density and error estimates , 1993, Pattern Recognit..

[67]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[68]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[69]  Sean B. Holden PAC-like upper bounds for the sample complexity of leave-one-out cross-validation , 1996, COLT '96.

[70]  Ned Glick,et al.  Additive estimators for probabilities of correct classification , 1978, Pattern Recognit..

[71]  Miroslaw Pawlak,et al.  On the asymptotic properties of smoothed estimators of the classification error rate , 1988, Pattern Recognit..

[72]  John Shawe-Taylor,et al.  Sample sizes for multiple-output threshold networks , 1991 .

[73]  Luc Devroye,et al.  Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[74]  S. Snapinn,et al.  An Evaluation of Smoothed Classification Error- Rate Estimators , 1985 .

[75]  P. Hall On the Non‐Parametric Estimation of Mixture Proportions , 1981 .

[76]  Peter J. W. Rayner,et al.  Generalization and PAC learning: some new results for the class of generalized single-layer networks , 1995, IEEE Trans. Neural Networks.