Components of variance in ROC analysis of CADx classifier performance: II. Applications of the bootstrap

We review components-of-variance models for the uncertainty in estimates of the area under the ROC curve, Az, for the case of classical discriminants where we wish the uncertainty to generalize to a population of training cases as well as to a population of testing cases. A key observation from our previous work facilitates the use of resampling strategies to analyze a finite data set and classifier in terms of the components-of-variance models. In particular, we demonstrate the use of the statistical bootstrap in combination with a four-term variance model to solve for the contributions of the uncertainty in Az that result from a given finite training sample, a given finite test sample, and their interaction. At the same time one obtains an expression from which one can predict the change in uncertainty in estimates of Az that would result from a given change in the number of training samples and change in the number of test samples. This expression provides a quantitative design tool for estimating the size that would be required in a larger pivotal study from the results of a smaller pilot study for the purpose of achieving a desired precision in Az and the desired generalizability.

[1]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[2]  Craig K. Abbey,et al.  Stabilized estimates of Hotelling-observer detection performance in patient-structured noise , 1998, Medical Imaging.

[3]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[4]  H E Rockette,et al.  Nonparametric estimation of degenerate ROC data sets used for comparison of imaging systems. , 1990, Investigative radiology.

[5]  James J. Bailey,et al.  Nonparametric comparison of two tests of cardiac function on the same patient population using the entire ROC curve , 1988, Proceedings. Computers in Cardiology 1988.

[6]  Berkman Sahiner,et al.  Effects of sample size on classifier design for computer-aided diagnosis , 1998, Medical Imaging.

[7]  Berkman Sahiner,et al.  Effects of sample size on classifier design: quadratic and neural network classifiers , 1997, Medical Imaging.

[8]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[9]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[10]  Berkman Sahiner,et al.  Finite-sample effects and resampling plans: applications to linear classifiers in computer-aided diagnosis , 1997, Medical Imaging.

[11]  G. Campbell,et al.  Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. , 1994, Statistics in medicine.

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Berkman Sahiner,et al.  Components of variance in ROC analysis of CADx classifier performance , 1998, Medical Imaging.

[15]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[16]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.