Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases

The multiple-reader, multiple-case (MRMC) approach to receiver operating characteristic (ROC) analysis is becoming the dominant assessment paradigm in medical imaging. Its most common version involves having many readers read every patient case in the study, a critical feature since differences among competing imaging modalities are often dominated by differences in reader performance. The present authors have carried out MRMC ROC analysis on a uniquely large data set for mammography. The analysis quantifies the great range of observed reader skill in that data set. It also demonstrates that the sample sizes are sufficiently large that the conclusions generalize to the populations sampled here with little uncertainty from the finite sample size. A schematic approach to bracketing the utility matrix is then used to study trends in the resulting expected utility functions that correspond to the range of observed ROC curves. This is done for both the screening and the diagnostic context. The results raise 2 hypotheses for further investigation. First, it is possible that the present ambiguity surrounding the effectiveness of mammography is due in part to the observed range of reader skills and corresponding expected utility functions. Second, it is possible that computer-assisted modalities for mammography may lead to improvements in the expected utility function not only for screening but also in the diagnostic context, especially for the lower performing readers.

[1]  Craig A. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[2]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[3]  H E Rockette,et al.  Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies. , 1999, Academic radiology.

[4]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[5]  R. F. Wagner,et al.  Study design in the evaluation of breast cancer imaging technologies. , 2000, Academic radiology.

[6]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[7]  C A Gatsonis,et al.  Regression analysis of correlated receiver operating characteristic data. , 1995, Academic radiology.

[8]  S G Baker,et al.  Identifying Combinations of Cancer Markers for Further Study as Triggers of Early Intervention , 2000, Biometrics.

[9]  E. Burnside,et al.  The impact of alternative practices on the cost and quality of mammographic screening in the United States. , 2001, Clinical breast cancer.

[10]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[11]  Stuart G. Baker,et al.  A Proposed Design and Analysis for Comparing Digital and Analog Mammography , 2001 .

[12]  J A Swets,et al.  Variability in the interpretation of mammograms. , 1995, The New England journal of medicine.

[13]  C. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[14]  R. F. Wagner,et al.  Assessment of medical imaging and computer-assist systems: lessons from recent experience. , 2002, Academic radiology.

[15]  A. Gelfand,et al.  Predicting the cumulative risk of false-positive mammograms. , 2000, Journal of the National Cancer Institute.

[16]  H. Ishwaran,et al.  A general class of hierarchical ordinal regression models with applications to correlated roc analysis , 2000 .

[17]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[18]  R. Wagner,et al.  Science is alive and well at the Food and Drug Administration. , 1997, Radiology.

[19]  J A Swets,et al.  Staging prostate cancer with MR imaging: a combined radiologist-computer system. , 1997, Radiology.

[20]  R. F. Wagner,et al.  Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. , 2001, Academic radiology.

[21]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[22]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[23]  Richard Horton,et al.  Screening mammography—an overview revisited , 2001, The Lancet.

[24]  J. Swets Indices of discrimination or diagnostic accuracy: their ROCs and implied models. , 1986, Psychological bulletin.

[25]  J A Swets,et al.  Enhancing and Evaluating Diagnostic Accuracy , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[26]  A. Toledano,et al.  Ordinal regression methodology for ROC curves derived from correlated data. , 1996, Statistics in medicine.

[27]  K Doi,et al.  Automated segmentation of digitized mammograms. , 1995, Academic radiology.

[28]  N. Petrick,et al.  Improvement of radiologists' characterization of mammographic masses by using computer-aided diagnosis: an ROC study. , 1999, Radiology.

[29]  N A Obuchowski,et al.  Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. , 1995, Academic radiology.

[30]  D B Kopans,et al.  Recent issues in breast cancer detection and the premarket approval by the Food and Drug Administration of a US system for breast lesion evaluation: what happened to science? , 1997, Radiology.

[31]  R C Zepp,et al.  Simple steps for improving multiple-reader studies in radiology. , 1996, AJR. American journal of roentgenology.

[32]  Craig A. Beam,et al.  Reader strategies: variability and error- methodology, findings, and health policy implications from a study of the U.S. population of mammographers , 2002, SPIE Medical Imaging.

[33]  M. Giger,et al.  Improving breast cancer diagnosis with computer-aided diagnosis. , 1999, Academic radiology.

[34]  J. Swets,et al.  Assessment of diagnostic technologies. , 1979, Science.

[35]  J A Swets,et al.  Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. , 1986, Psychological bulletin.

[36]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[37]  Robert F. Wagner,et al.  Propagation of reader variability in mammography to the variable expected benefit over the population , 2003, SPIE Medical Imaging.

[38]  D D Patton,et al.  A utility-based model for comparing the cost-effectiveness of diagnostic studies. , 1989, Investigative radiology.

[39]  Peter C Gøtzsche,et al.  Cochrane review on screening for breast cancer with mammography , 2001, The Lancet.

[40]  J. Swets,et al.  Enhanced interpretation of diagnostic images. , 1988, Investigative radiology.

[41]  N L Müller,et al.  Pulmonary embolism: prospective comparison of spiral CT with ventilation-perfusion scintigraphy. , 1997, Radiology.

[42]  Charles E. Metz,et al.  Implications of a ''Noisy'' observer to data processing techniques , 1976 .

[43]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[44]  Emily F Conant,et al.  Association of volume and volume-independent factors with accuracy in screening mammogram interpretation. , 2003, Journal of the National Cancer Institute.

[45]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[46]  A. Morrison,et al.  Basic issues in population screening for cancer. , 1980, Journal of the National Cancer Institute.

[47]  J R Beck,et al.  Decision-making Studies in Patient Management , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[48]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[49]  R F Wagner,et al.  Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. , 2001, Academic radiology.