A scaling transformation for classifier output based on likelihood ratio: applications to a CAD workstation for diagnosis of breast cancer.

PURPOSE The authors developed scaling methods that monotonically transform the output of one classifier to the "scale" of another. Such transformations affect the distribution of classifier output while leaving the ROC curve unchanged. In particular, they investigated transformations between radiologists and computer classifiers, with the goal of addressing the problem of comparing and interpreting case-specific values of output from two classifiers. METHODS Using both simulated and radiologists' rating data of breast imaging cases, the authors investigated a likelihood-ratio-scaling transformation, based on "matching" classifier likelihood ratios. For comparison, three other scaling transformations were investigated that were based on matching classifier true positive fraction, false positive fraction, or cumulative distribution function, respectively. The authors explored modifying the computer output to reflect the scale of the radiologist, as well as modifying the radiologist's ratings to reflect the scale of the computer. They also evaluated how dataset size affects the transformations. RESULTS When ROC curves of two classifiers differed substantially, the four transformations were found to be quite different. The likelihood-ratio scaling transformation was found to vary widely from radiologist to radiologist. Similar results were found for the other transformations. Our simulations explored the effect of database sizes on the accuracy of the estimation of our scaling transformations. CONCLUSIONS The likelihood-ratio-scaling transformation that the authors have developed and evaluated was shown to be capable of transforming computer and radiologist outputs to a common scale reliably, thereby allowing the comparison of the computer and radiologist outputs on the basis of a clinically relevant statistic.

[1]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[2]  Robert M. Nishikawa,et al.  Results of an Observer Study with an Intelligent Mammographic Workstation for CAD , 2003 .

[3]  J. Hershey,et al.  Value-Induced Bias in Medical Decision Making , 2008, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[5]  M. Giger,et al.  Improving breast cancer diagnosis with computer-aided diagnosis. , 1999, Academic radiology.

[6]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[7]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[8]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[9]  C E Metz,et al.  Gains in Accuracy from Replicated Readings of Diagnostic Images , 1992, Medical decision making : an international journal of the Society for Medical Decision Making.

[10]  Karen Drukker,et al.  Semiparametric estimation of the relationship between ROC operating points and the test-result scale: application to the proper binormal model. , 2011, Academic radiology.

[11]  Li Lan,et al.  Classification of breast lesions with multimodality computer-aided diagnosis: observer study results on an independent clinical data set. , 2006, Radiology.

[12]  N. Petrick,et al.  Improvement of radiologists' characterization of mammographic masses by using computer-aided diagnosis: an ROC study. , 1999, Radiology.

[13]  Lorenzo L. Pesce,et al.  Reliable and computationally efficient maximum-likelihood estimation of "proper" binormal ROC curves. , 2007, Academic radiology.

[14]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[15]  Maryellen L Giger,et al.  Prevalence scaling: applications to an intelligent workstation for the diagnosis of breast cancer. , 2008, Academic radiology.

[16]  H E Rockette,et al.  Effect of two rating formats in multi-disease ROC study of chest images. , 1990, Investigative radiology.

[17]  M. Kendall,et al.  Rank and product-moment correlation. , 1949, Biometrika.

[18]  J G Dolan,et al.  An Eualuation of Clinicians' Subjective Prior Probability Estimates , 1986, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  M. Giger,et al.  Breast cancer: effectiveness of computer-aided diagnosis observer study with independent database of mammograms. , 2002, Radiology.

[20]  N. Petrick,et al.  Improvement in radiologists' characterization of malignant and benign breast masses on serial mammograms with computer-aided diagnosis: an ROC study. , 2004, Radiology.

[21]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .