Comparison of classifier performance estimators: a simulation study

We aim to compare resampling-based estimators of the area under the ROC curve (AUC) of a classifier with a Monte Carlo simulation study. The comparison is in terms of bias, variance, and mean square error. We also examine the corresponding variance estimators of these AUC estimators. We compared three AUC estimators: the hold-out (HO) estimator, the leave-one-out cross validation (LOOCV) estimator, and the leave-pair-out bootstrap (LPOB) estimator. Each performance estimator has its own variability estimator. In our simulations, in terms of the mean square error, HO is always the worst and the ranking of the other two estimators depends on the interplay of sample size, dimensionality, and the population separability. In terms of estimator variability, the LPOB is the least variable estimator and the HO is the most variable estimator. The results also show that the estimation of the variance of LPOB using the influence function approach with a finite data set is unbiased or conservatively biased whereas the estimation of the variance of the LOOCV or the HO is downwardly (i.e., anti-conservatively) biased.