On the comparison of FROC curves in mammography CAD systems.

We present a novel method for assessing the performance of computer-aided detection systems on unseen cases at a given sensitivity level. The sampling error introduced when training the system on a limited data set is captured as the uncertainty in determining the system threshold that would yield a certain predetermined sensitivity on unseen data sets. By estimating the distribution of system thresholds, we construct a confidence interval for the expected number of false positive markings per image at a given sensitivity. We present two alternative procedures for estimating the probability density functions needed for the construction of the confidence interval. The first is based on the common assumption of Poisson distributed number of false positive markings per image. This procedure also relies on the assumption of independence between false positives and sensitivity, an assumption that can be relaxed with the second procedure, which is nonparametric. The second procedure uses the bootstrap applied to the data generated in the leave-one-out construction of the FROC curve, and is a fast and robust way of obtaining the desired confidence interval. Standard FROC curve analysis does not account for the uncertainty in setting the system threshold, so this method should allow for a more fair comparison of different systems. The resulting confidence intervals are surprisingly wide. For our system a conventional FROC curve analysis yields 0.47 false positive markings per image at 90% sensitivity. The 90% confidence interval for the number of false positive markings per image is (0.28, 1.02) with the parametric procedure and (0.27, 1.04) with the nonparametric bootstrap. Due to its computational simplicity and its allowing more fair comparisons between systems, we propose this method as a complement to the traditionally presented FROC curves.

[1]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[2]  K Doi,et al.  Effect of case selection on the performance of computer-aided detection schemes. , 1994, Medical physics.

[3]  R. Swensson Unified measurement of observer performance in detecting and localizing target objects on images. , 1996, Medical physics.

[4]  N. Karssemeijer,et al.  Detection criteria for evaluation of computer aided diagnosis systems , 1996, Proceedings of 18th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[5]  Nico Karssemeijer,et al.  Detection of stellate distortions in mammograms , 1996, IEEE Trans. Medical Imaging.

[6]  Richard H. Moore,et al.  Current Status of the Digital Database for Screening Mammography , 1998, Digital Mammography / IWDM.

[7]  K Doi,et al.  Analysis of methods for reducing false positives in the automated detection of clustered microcalcifications in mammograms. , 1998, Medical physics.

[8]  Robert M. Nishikawa,et al.  Variations in measured performance of CAD schemes due to database composition and scoring protocol , 1998, Medical Imaging.

[9]  Jonathan M. Garibaldi,et al.  Receiver operating characteristic analysis for intelligent medical systems-a new approach for finding confidence intervals , 2000, IEEE Trans. Biomed. Eng..

[10]  N. Karssemeijer,et al.  Segmentation of suspicious densities in digital mammograms. , 2001, Medical physics.

[11]  Berkman Sahiner,et al.  Breast cancer detection: evaluation of a mass-detection algorithm for computer-aided diagnosis -- experience in 263 patients. , 2002, Radiology.

[12]  Darrin C. Edwards,et al.  Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. , 2002, Medical physics.

[13]  Hans Bornefalk Use of phase and certainty information in automatic detection of stellate patterns in mammograms , 2004, SPIE Medical Imaging.