Observer variation and the performance accuracy gained by averaging ratings of abnormality.

Six radiologists used continuous scales to rate 529 chest-film cases for likelihood of five different types of abnormalities (interstitial disease, nodule, pneumothorax, alveolar infiltrate, and rib fracture) in each of six replicated readings, yielding 36 separate ratings of each case for the five abnormalities. Separate data analyses of all cases and subsets of the difficult/subtle cases for each abnormality estimated the relative gains in accuracy (linear-scaled area below the ROC curve) obtained by averaging the case-ratings across (a) six independent replications by each reader (25% gain), (b) six different readers within each replication (34% gain), or (c) all 36 readings (48% gain). Although accuracy differed among both readers and abnormalities, ROC curves for the median ratings showed similar relative gains in accuracy, somewhat greater than those predicted from the measured rating correlations. A model for variance components in the observer's latent decision variable could predict these gains from measured correlations in the single ratings of cases. Depending on whether the model's estimates were based on realized accuracy gains or on rating correlations, about 48% or 39% of each reader's total decision variance (summed variance for positive and negative cases) consisted of random (within-reader) error that was uncorrelated between replications, another 10% or 14% came from idiosyncratic responses to individual cases, and about 43% or 47% was systematic variation that all readers found in the sampled cases.

[1]  H E Rockette,et al.  Selection of subtle cases for observer-performance studies: the importance of knowing the true diagnosis. , 1998, Academic radiology.

[2]  C E Metz,et al.  Gains in Accuracy from Replicated Readings of Diagnostic Images , 1992, Medical decision making : an international journal of the Society for Medical Decision Making.

[3]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[4]  W Grossman,et al.  Evaluating the radiographic assessment of pulmonary venous hypertension in chronic heart disease. , 1984, AJR. American journal of roentgenology.

[5]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[6]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[7]  A E Burgess,et al.  Visual signal detection. IV. Observer inconsistency. , 1988, Journal of the Optical Society of America. A, Optics and image science.

[8]  C E Metz,et al.  Variance-component modeling in the analysis of receiver operating characteristic index estimates. , 1997, Academic radiology.

[9]  R G Swensson,et al.  Display thresholding of images and observer detection performance. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[10]  Jill L. King,et al.  Observer performance assessment of JPEG-compressed high-resolution chest images , 1999, Medical Imaging.

[11]  R G Swensson,et al.  Flattening of the contrast-detail curve for large lesions on liver CT images. , 1994, Medical physics.

[12]  D. M. Green,et al.  CONSISTENCY OF AUDITORY DETECTION JUDGMENTS. , 1963, Psychological review.

[13]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[14]  D M Green,et al.  Two procedures for estimating internal noise. , 1981, The Journal of the Acoustical Society of America.

[15]  R G Swensson,et al.  Using Localization Data from Image Interpretations to Improve Estimates of Performance Accuracy , 2000, Medical decision making : an international journal of the Society for Medical Decision Making.

[16]  R G Swensson,et al.  Measuring performance efficiency and consistency in visual discriminations with noisy images. , 1996, Journal of experimental psychology. Human perception and performance.

[17]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[18]  H E Rockette,et al.  Receiver operating characteristic analysis of chest image interpretation with conventional, laser-printed, and high-resolution workstation images. , 1990, Radiology.