An experimental comparison of cross-validation techniques for estimating the area under the ROC curve

Reliable estimation of the classification performance of inferred predictive models is difficult when working with small data sets. Cross-validation is in this case a typical strategy for estimating the performance. However, many standard approaches to cross-validation suffer from extensive bias or variance when the area under the ROC curve (AUC) is used as the performance measure. This issue is explored through an extensive simulation study. Leave-pair-out cross-validation is proposed for conditional AUC-estimation, as it is almost unbiased, and its deviation variance is as low as that of the best alternative approaches. When using regularized least-squares based learners, efficient algorithms exist for calculating the leave-pair-out cross-validation estimate.

[1]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[3]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[4]  Jing Peng,et al.  Finite Sample Error Bound for Parzen Windows , 2005, AAAI.

[5]  Tapio Pahikkala,et al.  An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[6]  Peter A. Flach,et al.  A Response to Webb and Ting’s On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions , 2005, Machine Learning.

[7]  Simon Günter,et al.  Stratification bias in low signal microarray studies , 2007, BMC Bioinformatics.

[8]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[9]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[10]  Tapio Salakoski,et al.  Matrix representations, linear transformations, and kernels for disambiguation in natural language , 2009, Machine Learning.

[11]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[12]  Mehryar Mohri,et al.  An Alternative Ranking Problem for Search Engines , 2007, WEA.

[13]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[14]  Bernard De Baets,et al.  ROC analysis in ordinal regression learning , 2008, Pattern Recognit. Lett..

[15]  Tong Zhang,et al.  On the Dual Formulation of Regularized Linear Systems with Convex Risks , 2002, Machine Learning.

[16]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[17]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[20]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[21]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[22]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[23]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[24]  Stuart G. Baker,et al.  Identifying genes that contribute most to good classification in microarrays , 2006, BMC Bioinformatics.

[25]  George Forman,et al.  Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement , 2010, SKDD.

[26]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[27]  Eyke Hüllermeier,et al.  A critical analysis of variants of the AUC , 2008, Machine Learning.

[28]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[29]  David J. Hand,et al.  ASSESSING ERROR RATE ESTIMATORS: THE LEAVE-ONE-OUT METHOD RECONSIDERED , 1997 .

[30]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[31]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[32]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[35]  David J. Hand,et al.  Ten More Years of Error Rate Research , 2000 .

[36]  Tapio Salakoski,et al.  A comparison of AUC estimators in small-sample studies , 2009, MLSB.

[37]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[38]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[39]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[40]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[41]  Blaise Hanczar,et al.  Small-sample precision of ROC-related estimates , 2010, Bioinform..

[42]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[43]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.