Estimating misclassification error: a closer look at cross-validation based methods

BackgroundTo estimate a classifier’s error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV) methods based on sampling without replacement. Monte Carlo (MC) simulation studies aimed at estimating the true misclassification error conditional on the training set are commonly used to compare CV methods. We conducted an MC simulation study to compare a new method of bootstrap CV (BCV) to k-fold CV for estimating clasification error.FindingsFor the low-dimensional conditions simulated, the modest positive bias of k-fold CV contrasted sharply with the substantial negative bias of the new BCV method. This behavior was corroborated using a real-world dataset of prognostic gene-expression profiles in breast cancer patients. Our simulation results demonstrate some extreme characteristics of variance and bias that can occur due to a fault in the design of CV exercises aimed at estimating the true conditional error of a classifier, and that appear not to have been fully appreciated in previous studies. Although CV is a sound practice for estimating a classifier’s generalization error, using CV to estimate the fixed misclassification error of a trained classifier conditional on the training set is problematic. While MC simulation of this estimation exercise can correctly represent the average bias of a classifier, it will overstate the between-run variance of the bias.ConclusionsWe recommend k-fold CV over the new BCV method for estimating a classifier’s generalization error. The extreme negative bias of BCV is too high a price to pay for its reduced variance.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[3]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[4]  Dafydd Gibbon,et al.  1 User’s guide , 1998 .

[5]  Richard Simon,et al.  An evaluation of resampling methods for assessment of survival risk prediction in high‐dimensional settings , 2011, Statistics in medicine.

[6]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[7]  J. Friedman Regularized Discriminant Analysis , 1989 .

[8]  Wenjiang J. Fu,et al.  Estimating misclassification error with small samples via bootstrap cross-validation , 2005, Bioinform..

[9]  James J. Chen,et al.  Ensemble methods for classification of patients for personalized medicine with high-dimensional data , 2007, Artif. Intell. Medicine.

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[12]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[13]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S. Mazumdar,et al.  The Utility of Structure–Activity Relationship (SAR) Models for Prediction and Covariate Selection in Developmental Toxicity: Comparative Analysis of Logistic Regression and Decision Tree Models , 2004, SAR and QSAR in environmental research.

[15]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[16]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[17]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[18]  A. C. Davison,et al.  On the bias and variability of bootstrap and cross-validation estimates of error rate in discrimination problems , 1992 .

[19]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..