A preliminary comparison of different methods for observer performance estimation

In medical imaging, image quality is assessed by the degree to which a human observer can correctly perform a given diagnostic task. Therefore the image quality is typically quantified by using performance measurements from decision/detection theory like the receiver operation characteristic (ROC) curve and the area under ROC curve (AUC). In this paper we compare five different AUC estimation techniques, widely used in the literature, including parametric and non-parametric methods. We compared each method by equivalence hypothesis testing using a model observer as well as data sets from a previously published human observer study. The main conclusions of this work are 1) if a small number of images are scored, one cannot tell apart different AUC estimation methods due to large variability in AUC estimates, regardless whether image scores are reported on a continuous or quantized scale. 2) If the number of scored images is large and image scores are reported on a continuous scale, all tested AUC estimation methods are statistically equivalent.