Repeatability in computer-aided diagnosis: Application to breast cancer diagnosis on sonography.

PURPOSE The aim of this study was to investigate the concept of repeatability in a case-based performance evaluation of two classifiers commonly used in computer-aided diagnosis in the task of distinguishing benign from malignant lesions. METHODS The authors performed .632+ bootstrap analyses using a data set of 1251 sonographic lesions of which 212 were malignant. Several analyses were performed investigating the impact of sample size and number of bootstrap iterations. The classifiers investigated were a Bayesian neural net (BNN) with five hidden units and linear discriminant analysis (LDA). Both used the same four input lesion features. While the authors did evaluate classifier performance using receiver operating characteristic (ROC) analysis, the main focus was to investigate case-based performance based on the classifier output for individual cases, i.e., the classifier outputs for each test case measured over the bootstrap iterations. In this case-based analysis, the authors examined the classifier output variability and linked it to the concept of repeatability. Repeatability was assessed on the level of individual cases, overall for all cases in the data set, and regarding its dependence on the case-based classifier output. The impact of repeatability was studied when aiming to operate at a constant sensitivity or specificity and when aiming to operate at a constant threshold value for the classifier output. RESULTS The BNN slightly outperformed the LDA with an area under the ROC curve of 0.88 versus 0.85 (p<0.05). In the repeatability analysis on an individual case basis, it was evident that different cases posed different degrees of difficulty to each classifier as measured by the by-case output variability. When considering the entire data set, however, the overall repeatability of the BNN classifier was lower than for the LDA classifier, i.e., the by-case variability for the BNN was higher. The dependence of the by-case variability on the average by-case classifier output was markedly different for the classifiers. The BNN achieved the lowest variability (best repeatability) when operating at high sensitivity (>90%) and low specificity (<66%), while the LDA achieved this at moderate sensitivity (∼74%) and specificity (∼84%). When operating at constant 90% sensitivity or constant 90% specificity, the width of the 95% confidence intervals for the corresponding classifier output was considerable for both classifiers and increased for smaller sample sizes. When operating at a constant threshold value for the classifier output, the width of the 95% confidence intervals for the corresponding sensitivity and specificity ranged from 9 percentage points (pp) to 30 pp. CONCLUSIONS The repeatability of the classifier output can have a substantial effect on the obtained sensitivity and specificity. Knowledge of classifier repeatability, in addition to overall performance level, is important for successful translation and implementation of computer-aided diagnosis in clinical decision making.

[1]  Kunio Doi,et al.  Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. , 2009, Radiology.

[2]  Nico Karssemeijer,et al.  Classification of mammographic masses using support vector machines and Bayesian networks , 2007, SPIE Medical Imaging.

[3]  Lubomir M. Hadjiiski,et al.  Classifier performance prediction for computer-aided diagnosis using a limited dataset. , 2008, Medical physics.

[4]  Bram van Ginneken,et al.  Dissimilarity-based classification in the absence of local ground truth: Application to the diagnostic interpretation of chest radiographs , 2009, Pattern Recognit..

[5]  R. D. Jones,et al.  A Neural Net Model for Prediction , 1994 .

[6]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[7]  N. Cook Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[8]  R. Fisher THE PRECISION OF DISCRIMINANT FUNCTIONS , 1940 .

[9]  Lorenzo L. Pesce,et al.  Performance of breast ultrasound computer-aided diagnosis: dependence on image selection. , 2008, Academic radiology.

[10]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[11]  Lorenzo L. Pesce,et al.  Breast US computer-aided diagnosis system: robustness across urban populations in South Korea and the United States. , 2009, Radiology.

[12]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[13]  H. Chan,et al.  Multi-modality CADx: ROC study of the effect on radiologists' accuracy in characterizing breast masses on mammograms and 3D ultrasound images. , 2009, Academic radiology.

[14]  M. Giger,et al.  Anniversary paper: History and status of CAD and quantitative image analysis: the role of Medical Physics and AAPM. , 2008, Medical physics.

[15]  A. Beckett,et al.  AKUFO AND IBARAPA. , 1965, Lancet.

[16]  R. F. Wagner,et al.  Assessment methodologies and statistical issues for computer-aided diagnosis of lung nodules in computed tomography: contemporary research topics relevant to the lung image database consortium. , 2004, Academic radiology.

[17]  J M Bland,et al.  Statistical methods for assessing agreement between two methods of clinical measurement , 1986 .

[18]  M. Giger,et al.  Computerized detection and classification of cancer on breast ultrasound. , 2004, Academic radiology.

[19]  Maryellen L Giger,et al.  Potential effect of different radiologist reporting methods on studies showing benefit of CAD. , 2008, Academic radiology.

[20]  M. Giger,et al.  Robustness of computerized lesion detection and classification scheme across different breast US platforms. , 2005, Radiology.

[21]  M. Giger,et al.  Computerized diagnosis of breast lesions on ultrasound. , 2002, Medical physics.

[22]  M. Giger,et al.  Automatic segmentation of breast lesions on ultrasound. , 2001, Medical physics.

[23]  Li Lan,et al.  Evaluation of computer-aided diagnosis on a large clinical full-field digital mammographic dataset. , 2008, Academic radiology.

[24]  Berkman Sahiner,et al.  Computer-aided diagnosis of lung cancer and pulmonary embolism in computed tomography-a review. , 2008, Academic radiology.

[25]  R. F. Wagner,et al.  Assessment of medical imaging systems and computer aids: a tutorial review. , 2007, Academic radiology.

[26]  M. Giger,et al.  Breast US computer-aided diagnosis workstation: performance with a large clinical diagnostic population. , 2008, Radiology.