Comparing image detection algorithms using resampling

The ability to statistically compare the performance of two computer detection (CD) or computer-aided detection (CAD) algorithms is fundamental for the development and evaluation of medical image analysis tools. Automated detection tools for medical imaging are commonly characterized using free-response receiver operating characteristic (FROC) methods. However, few statistical tools are currently available to estimate statistical significance when comparing two FROC performance curves. In this study, we introduce a permutation and a bootstrap resampling method for the nonparametric estimation of statistical significance of performance metrics when comparing two FROC curves. We then provide an initial validation of the proposed methods using an area under the FROC performance metric and a simulation model for creating CD algorithm prompts. Validation is based on a comparison of the Type I error rate produced by two statistically identical CD algorithms. The results of 104 Monte Carlo trials show that both the permutation and bootstrap methods produced excellent estimates of the expected Type I error rate