论文信息 - Repeatability in computer-aided diagnosis: Application to breast cancer diagnosis on sonography.

Repeatability in computer-aided diagnosis: Application to breast cancer diagnosis on sonography.

PURPOSE The aim of this study was to investigate the concept of repeatability in a case-based performance evaluation of two classifiers commonly used in computer-aided diagnosis in the task of distinguishing benign from malignant lesions. METHODS The authors performed .632+ bootstrap analyses using a data set of 1251 sonographic lesions of which 212 were malignant. Several analyses were performed investigating the impact of sample size and number of bootstrap iterations. The classifiers investigated were a Bayesian neural net (BNN) with five hidden units and linear discriminant analysis (LDA). Both used the same four input lesion features. While the authors did evaluate classifier performance using receiver operating characteristic (ROC) analysis, the main focus was to investigate case-based performance based on the classifier output for individual cases, i.e., the classifier outputs for each test case measured over the bootstrap iterations. In this case-based analysis, the authors examined the classifier output variability and linked it to the concept of repeatability. Repeatability was assessed on the level of individual cases, overall for all cases in the data set, and regarding its dependence on the case-based classifier output. The impact of repeatability was studied when aiming to operate at a constant sensitivity or specificity and when aiming to operate at a constant threshold value for the classifier output. RESULTS The BNN slightly outperformed the LDA with an area under the ROC curve of 0.88 versus 0.85 (p<0.05). In the repeatability analysis on an individual case basis, it was evident that different cases posed different degrees of difficulty to each classifier as measured by the by-case output variability. When considering the entire data set, however, the overall repeatability of the BNN classifier was lower than for the LDA classifier, i.e., the by-case variability for the BNN was higher. The dependence of the by-case variability on the average by-case classifier output was markedly different for the classifiers. The BNN achieved the lowest variability (best repeatability) when operating at high sensitivity (>90%) and low specificity (<66%), while the LDA achieved this at moderate sensitivity (∼74%) and specificity (∼84%). When operating at constant 90% sensitivity or constant 90% specificity, the width of the 95% confidence intervals for the corresponding classifier output was considerable for both classifiers and increased for smaller sample sizes. When operating at a constant threshold value for the classifier output, the width of the 95% confidence intervals for the corresponding sensitivity and specificity ranged from 9 percentage points (pp) to 30 pp. CONCLUSIONS The repeatability of the classifier output can have a substantial effect on the obtained sensitivity and specificity. Knowledge of classifier repeatability, in addition to overall performance level, is important for successful translation and implementation of computer-aided diagnosis in clinical decision making.

Lorenzo L. Pesce | M. Giger | K. Drukker

[1] Kunio Doi,et al. Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. , 2009, Radiology.

[2] Nico Karssemeijer,et al. Classification of mammographic masses using support vector machines and Bayesian networks , 2007, SPIE Medical Imaging.

[3] Lubomir M. Hadjiiski,et al. Classifier performance prediction for computer-aided diagnosis using a limited dataset. , 2008, Medical physics.

[4] Bram van Ginneken,et al. Dissimilarity-based classification in the absence of local ground truth: Application to the diagnostic interpretation of chest radiographs , 2009, Pattern Recognit..

[5] R. D. Jones,et al. A Neural Net Model for Prediction , 1994 .

[6] C. Metz. Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[7] N. Cook. Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[8] R. Fisher. THE PRECISION OF DISCRIMINANT FUNCTIONS , 1940 .

[9] Lorenzo L. Pesce,et al. Performance of breast ultrasound computer-aided diagnosis: dependence on image selection. , 2008, Academic radiology.

[10] R. Tibshirani,et al. Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[11] Lorenzo L. Pesce,et al. Breast US computer-aided diagnosis system: robustness across urban populations in South Korea and the United States. , 2009, Radiology.