Context bias. A problem in diagnostic radiology.

OBJECTIVE To determine whether radiologists' interpretations of images are biased by their context and by prevalence of disease in other recently observed cases. METHODS A test set of 24 right pulmonary arteriograms with a 33% prevalence of pulmonary emboli (PE) was assembled and embedded in 2 larger groups of films. Group A contained 16 additional arteriograms, all showing PE involving the right lung, so that total prevalence was 60%. Group B contained 16 additional arteriograms without PE so that total prevalence was 20%. Six radiologists were randomly assigned to see either group first and then "cross over" to review the other group after a hiatus of at least 8 weeks. The direction of changes in a 5-point rating scale for the 2 readings of each film in the test set was compared with the sign test; mean sensitivity, specificity, and areas under receiver operating characteristic (ROC) curves were compared with the paired t test. RESULTS In the context of group A's higher disease prevalence, radiologists shifted more of their diagnoses toward higher suspicion than expected by chance (P=.03, sign test). In group A, mean sensitivity for diagnosing PE was significantly higher (75% vs 60%; P=.04), and area under the ROC curve was significantly larger (0.88 vs 0.82; P=.02). CONCLUSIONS Radiologists' diagnoses are significantly influenced by the context of interpretation, even when spectrum and verification bias are avoided. This "context bias" effect is unique to the evaluation of subjectively interpreted tests, and illustrates the difficulty of obtaining unbiased estimates of diagnostic accuracy for both new and existing technologies.

[1]  N. Müller,et al.  Detection and differential diagnosis of pulmonary infections and tumors in patients with AIDS: value of chest radiography versus CT. , 1996, AJR. American journal of roentgenology.

[2]  L. Baker,et al.  Breast cancer detection demonstration project: Five‐year summary report , 1982, CA: a cancer journal for clinicians.

[3]  R A Greenes,et al.  Assessment of Diagnostic Technologies: Methodology for Unbiased Estimation from Samples of Selectively Verified Patients , 1985, Investigative radiology.

[4]  J. Baron Uncertainty in Bayes , 1994, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[6]  H. Sugimoto,et al.  Early-stage rheumatoid arthritis: diagnostic accuracy of MR imaging. , 1996, Radiology.

[7]  J. Wittenberg,et al.  Hepatic tumors: quantitative tissue characterization with MR imaging. , 1990, Radiology.

[8]  M. Weinstein,et al.  Clinical Decision Analysis , 1980 .

[9]  M McCally,et al.  The poor quality of early evaluations of magnetic resonance imaging. , 1988, JAMA.

[10]  M. Baldini,et al.  Color Doppler sonography in Graves' disease: value in assessing activity of disease and predicting outcome. , 1996, AJR. American journal of roentgenology.

[11]  M I Boechat,et al.  The value of portal vein pulsatility on duplex sonograms as a sign of portal hypertension in children with liver disease. , 1995, AJR. American journal of roentgenology.

[12]  M. Schreiber,et al.  The clinical history as a factor in roentgenogram interpretation. , 1963, JAMA.

[13]  A. Feinstein,et al.  Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. , 1978, The New England journal of medicine.

[14]  H Brenner,et al.  Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. , 1993, American journal of epidemiology.

[15]  K S Berbaum,et al.  Impact of clinical history on fracture detection with radiography. , 1988, Radiology.

[16]  S. Shapiro,et al.  Ten- to fourteen-year effect of screening on breast cancer mortality. , 1982, Journal of the National Cancer Institute.

[17]  J A Hanley,et al.  Paired receiver operating characteristic curves and the effect of history on radiographic interpretation. CT of the head as a case study. , 1983, Radiology.

[18]  A. Feinstein,et al.  Spectrum Bias in the Evaluation of Diagnostic Tests: Lessons from the Rapid Dipstick Test for Urinary Tract Infection , 1992, Annals of Internal Medicine.

[19]  C B Begg,et al.  Biases in the assessment of diagnostic tests. , 1987, Statistics in medicine.

[20]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[21]  T. Hangartner,et al.  Osteopenia in children: CT assessment. , 1996, Radiology.

[22]  PET scans and technology assessment. , 1988, JAMA.

[23]  C. Coblentz,et al.  Effect of Clinical History on the Interpretation of Chest Radiographs in Childhood Bronchiolitis , 1993, Investigative radiology.

[24]  W. Black,et al.  Communicating the significance of radiologic test results: the likelihood ratio. , 1986, AJR. American journal of roentgenology.

[25]  A R Feinstein,et al.  The inadequacy of binary models for the clinical reality of three-zone diagnostic decisions. , 1990, Journal of clinical epidemiology.

[26]  Y Itzchak,et al.  Calcification of coronary arteries: detection and quantification with double-helix CT. , 1995, Radiology.