Radiology's Achilles' heel: error and variation in the interpretation of the Röntgen image.

The performance of the human eye and brain has failed to keep pace with the enormous technical progress in the first full century of radiology. Errors and variations in interpretation now represent the weakest aspect of clinical imaging. Those interpretations which differ from the consensus view of a panel of "experts" may be regarded as errors; where experts fail to achieve consensus, differing reports are regarded as "observer variation". Errors arise from poor technique, failures of perception, lack of knowledge and misjudgments. Observer variation is substantial and should be taken into account when different diagnostic methods are compared; in many cases the difference between observers outweighs the difference between techniques. Strategies for reducing error include attention to viewing conditions, training of the observers, availability of previous films and relevant clinical data, dual or multiple reporting, standardization of terminology and report format, and assistance from computers. Digital acquisition and display will probably not affect observer variation but the performance of radiologists, as measured by receiver operating characteristic (ROC) analysis, may be improved by computer-directed search for specific image features. Other current developments show that where image features can be comprehensively described, computer analysis can replace the perception function of the observer, whilst the function of interpretation can in some cases be performed better by artificial neural networks. However, computer-assisted diagnosis is still in its infancy and complete replacement of the human observer is as yet a remote possibility.

[1]  T. Marrie,et al.  Interobserver variability in the interpretation of chest roentgenograms of patients with possible pneumonia. , 1994, Archives of internal medicine.

[2]  M. Shapiro,et al.  Lateral down-sloping of the acromion: a useful MR sign? , 1996, Clinical radiology.

[3]  L. Berlin,et al.  Malpractice and radiologists in Cook County, IL: trends in 20 years of litigation. , 1995, AJR. American journal of roentgenology.

[4]  L A Cox Preliminary report on the validation of a grammar‐based computer system for assessing skeletal maturity with the Tanner‐Whitehouse 2 method , 1994, Acta paediatrica (Oslo, Norway : 1992). Supplement.

[5]  M. Hochberg,et al.  Reliability of grading scales for individual radiographic features of osteoarthritis of the knee. The Baltimore longitudinal study of aging atlas of knee osteoarthritis. , 1993, Investigative radiology.

[6]  P. Robinson Short communication: plain film reporting by radiographers--a feasibility study. , 1996, The British journal of radiology.

[7]  D. Alton,et al.  Observer variation in detecting the radiologic features associated with bronchiolitis. , 1991, Investigative radiology.

[8]  Clinical diagnosis from digital displays: results and conclusions from the St Mary's evaluation project. , 1994, The British journal of radiology.

[9]  L. Garland On the scientific evaluation of diagnostic procedures. , 1949, Radiology.

[10]  O. Eden,et al.  Inter-Observer Variation in Interpretation of Chest X-Rays , 1990, Scottish medical journal.

[11]  A. McCaskie,et al.  Radiological evaluation of the interfaces after cemented total hip replacement. Interobserver and intraobserver agreement. , 1996, The Journal of bone and joint surgery. British volume.

[12]  P. Wheeler Risk prevention, quality assurance, and the missed diagnosis conference. , 1982, Radiology.

[13]  J. Bloem,et al.  Reproducible radiographs of acetabular prostheses. A method assessed in 35 patients. , 1994, Acta orthopaedica Scandinavica.

[14]  Y. van der Graaf,et al.  Chest imaging with a selenium detector versus conventional film radiography: a CT-controlled study. , 1996, Radiology.

[15]  W. Butt,et al.  How well can radiographers triage x ray films in accident and emergency departments? , 1991, BMJ.

[16]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[17]  B J Flehinger,et al.  Non-small-cell lung cancer: results of the New York screening program. , 1984, Radiology.

[18]  F. Alcorn,et al.  The protocol and results of training nonradiologists to scan mammograms. , 1971, Radiology.

[19]  E. Krupinski,et al.  Searching for lung nodules. Visual dwell indicates locations of false-positive and false-negative decisions. , 1989, Investigative radiology.

[20]  R S Weinstein,et al.  Experience-related differences in diagnosis from medical images displayed on monitors. , 1996, Telemedicine journal : the official journal of the American Telemedicine Association.

[21]  R. Johnston,et al.  Reducing observer error in a 70-mm. chestradiography service for general practitioners. , 1955, Lancet.

[22]  J. Ikezoe,et al.  Interpretation of subtle interstitial lung abnormalities: conventional versus storage phosphor radiography. , 1993, Radiology.

[23]  R G Swensson,et al.  Radiographic interpretation with and without search: visual search aids the recognition of chest pathology. , 1982, Investigative radiology.

[24]  M. Schreiber,et al.  The clinical history as a factor in roentgenogram interpretation. , 1963, JAMA.

[25]  W R Hendee,et al.  The perception of radiologic image information. Report of an NCI workshop on April 15-16, 1985. , 1985, Investigative radiology.

[26]  K Doi,et al.  Improvement in radiologists' detection of clustered microcalcifications on mammograms. The potential of computer-aided diagnosis. , 1990, Investigative radiology.

[27]  Kevin S. Berbaum,et al.  Satisfaction of search in diagnostic radiology. , 1989 .

[28]  L Berlin Errors in judgment. , 1996, AJR. American journal of roentgenology.

[29]  H L Kundel,et al.  Visual search patterns and experience with radiological images. , 1972, Radiology.

[30]  S. Gundry,et al.  Chest roentgenograms in diagnosis of traumatic rupture of the aorta. Observer variation in interpretation. , 1984, Chest.

[31]  R G Swensson,et al.  Improving performance by multiple interpretations of chest radiographs: effectiveness and cost. , 1978, Radiology.

[32]  R. Bird Professional quality assurance for mammography screening programs. , 1990, Radiology.

[33]  Dual reading in a non-specialized breast cancer screening programme , 1996 .

[34]  K. Berbaum,et al.  Reliability of Standard Foot Radiographic Measurements , 1994, Foot & ankle international.

[35]  M. Benson,et al.  Errors in measurement of acetabular index. , 1995, Journal of pediatric orthopedics.

[36]  R G Swensson,et al.  The value of searching films without specific preconceptions. , 1985, Investigative radiology.

[37]  H. Chan,et al.  Computer-aided diagnosis: detection and characterization of hyperparathyroidism in digital hand radiographs. , 1993, Medical physics.

[38]  Wilbur L. Smith,et al.  The role of previous radiographs and reports in the interpretation of current radiographs. , 1993 .

[39]  F. Madsen,et al.  Garden's classification of femoral neck fractures. An assessment of inter-observer variation. , 1988, The Journal of bone and joint surgery. British volume.

[40]  S. Somers,et al.  Double-contrast barium enema studies: effect of multiple reading on perception error. , 1990, Radiology.

[41]  A Donner,et al.  The statistical analysis of kappa statistics in multiple samples. , 1996, Journal of clinical epidemiology.

[42]  Y. Wu,et al.  Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer. , 1993, Radiology.

[43]  L. Berlin The importance of proper radiographic positioning and technique. , 1996, AJR. American journal of roentgenology.

[44]  G. P. Browman,et al.  Assessment of observer variation in measuring the Radiographic Vertebral Index in patients with multiple myeloma. , 1990, Journal of clinical epidemiology.

[45]  M A Schork,et al.  Variability in Cobb angle measurements in children with congenital scoliosis. , 1995, The Journal of bone and joint surgery. British volume.

[46]  J. Zuckerman,et al.  The Neer classification system for proximal humeral fractures. An assessment of interobserver reliability and intraobserver reproducibility. , 1993, The Journal of bone and joint surgery. American volume.

[47]  F. Jensen,et al.  Inter- and intraobserver study of radiographic assessment of cemented total hip arthroplasties. , 1996, The Journal of arthroplasty.

[48]  M. Moskowitz,et al.  Breast cancer missed by mammography. , 1979, AJR. American journal of roentgenology.

[49]  C D Collins,et al.  Observer variation in pattern type and extent of disease in fibrosing alveolitis on thin section computed tomography and chest radiography. , 1994, Clinical radiology.

[50]  L B Lusted,et al.  Signal detectability and medical decision-making. , 1971, Science.

[51]  S Santavirta,et al.  Reliability of radiographical measurements of spondylolisthesis and extension-flexion radiographs of the lumbar spine. , 1994, European journal of radiology.

[52]  I S Simor,et al.  Sensitivity and specificity of first screen mammography in the Canadian National Breast Screening Study: a preliminary report from five centers. , 1986, Radiology.

[53]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[54]  R. Pauli,et al.  Radiographers as film readers in screening mammography: an assessment of competence under test and screening conditions. , 1996, The British journal of radiology.

[55]  J. B. Kneeland,et al.  Categorization of acromial shape: interobserver variability with MR imaging and conventional radiography. , 1994, AJR. American journal of roentgenology.

[56]  R J Brenner,et al.  Medicolegal aspects of breast imaging: variable standards of care relating to different types of practice. , 1991, AJR. American journal of roentgenology.

[57]  V M Haughton,et al.  The effect of clinical bias on the interpretation of myelography and spinal computed tomography. , 1982, Radiology.

[58]  K. Berbaum,et al.  Error in radiology: classification and lessons in 182 cases presented at a problem case conference. , 1992, Radiology.

[59]  H Labelle,et al.  Observer Variation in Assessing Spinal Curvature and Skeletal Development in Adolescent Idiopathic Scoliosis , 1988, Spine.

[60]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[61]  R C Zepp,et al.  Simple steps for improving multiple-reader studies in radiology. , 1996, AJR. American journal of roentgenology.

[62]  W. E. Miller,et al.  Lung cancer detected during a screening program using four-month chest radiographs. , 1983, Radiology.

[63]  C. Turen,et al.  Comparative analysis of radiographic interpretation of orthopedic films: is there redundancy? , 1995, The Journal of trauma.

[64]  L. Ryd,et al.  Measuring Hallux Valgus: A Comparison of Conventional Radiography and Clinical Parameters with Regard to Measurement Accuracy , 1995, Foot & ankle international.

[65]  K. Doi,et al.  Effect of a computer-aided diagnosis scheme on radiologists' performance in detection of lung nodules on radiographs. , 1996, Radiology.

[66]  L. Berlin Malpractice issues in radiology. Possessing ordinary knowledge. , 1996, AJR. American journal of roentgenology.

[67]  T. Pincus,et al.  Observer variation in quantitative assessment of rheumatoid arthritis. Part I. Scoring erosions and joint space narrowing. , 1985, Investigative radiology.

[68]  C. Floyd,et al.  Artificial neural network: improving the quality of breast biopsy recommendations. , 1996, Radiology.

[69]  J. Swets ROC analysis applied to the evaluation of medical imaging techniques. , 1979, Investigative radiology.

[70]  N Taub,et al.  An assessment of inter-observer agreement and accuracy when reporting plain radiographs. , 1997, Clinical radiology.

[71]  M P Capp,et al.  Receiver Operating Characteristic Evaluation of Computer Display of Adult Portable Chest Radiographs , 1994, Investigative radiology.

[72]  J M Messmer,et al.  Radiographic identification of unknown human remains and interpreter experience level. , 1994, Journal of forensic sciences.

[73]  Leonard Berlin,et al.  Malpractice Issues in Radiology , 1998 .

[74]  J. M. Pruneda,et al.  Computer-aided mammographic screening for spiculated lesions. , 1994, Radiology.

[75]  H E Rockette,et al.  Does knowledge of the clinical history affect the accuracy of chest radiograph interpretation? , 1990, AJR. American journal of roentgenology.

[76]  E. Coles,et al.  Precision of Larsen grading of radiographs in assessing progression of rheumatoid arthritis in individual patients. , 1990, Annals of the rheumatic diseases.

[77]  S. Somers,et al.  Interobserver variation in the interpretation of abdominal radiographs. , 1989, Radiology.

[78]  S. Stricker,et al.  Radiographic measurement of bowleg deformity: variability due to method and limb rotation. , 1994, Journal of pediatric orthopedics.

[79]  Huang Hk,et al.  Workstation design. Image manipulation, image set handling, and display issues. , 1996 .

[80]  L. Censi,et al.  Intra- and interobserver concordance in scoring Harris lines: a test on bone sections and radiographs. , 1994, American journal of physical anthropology.

[81]  S. Stricker,et al.  Langenskiöld classification of tibia vara: an assessment of interobserver variability. , 1994, Journal of pediatric orthopedics.

[82]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[83]  A. Kalla,et al.  Osteoporosis screening--radiogrammetry revisited. , 1989, British journal of rheumatology.

[84]  C. Floyd,et al.  Prediction of breast cancer malignancy using an artificial neural network , 1994, Cancer.

[85]  P Doubilet,et al.  Interpretation of radiographs: effect of clinical history. , 1981, AJR. American journal of roentgenology.

[86]  J. Pødenphant,et al.  Precision in assessment of osteoporosis from spine radiographs. , 1991, European journal of radiology.

[87]  E. Rogers,et al.  VIA-RAD: a blackboard-based system for diagnostic radiology. Visual Interaction Assistant for Radiology , 1995, Artif. Intell. Medicine.

[88]  H L Kundel,et al.  Visual scanning, pattern recognition and decision-making in pulmonary nodule detection. , 1978, Investigative radiology.

[89]  A. McManus,et al.  The Effect of Operative Position on Lumbar Lordosis: A Radiographic Study of Patients Under Anesthesia in the Prone and 90‐90 Positions , 1995, Spine.

[90]  J. Gurney,et al.  Solitary pulmonary nodules: determining the likelihood of malignancy with neural network analysis. , 1995, Radiology.

[91]  J. Millar,et al.  Detection of significant abnormalities on lumbar spine radiographs. , 1993, The British journal of radiology.

[92]  W. Cole,et al.  Reliability of radiological measurements in the assessment of the child's hip. , 1989, The Journal of bone and joint surgery. British volume.

[93]  K Doi,et al.  Digital chest radiography: effect of temporal subtraction images on detection accuracy. , 1997, Radiology.

[94]  C E Ravin,et al.  Threshold perception performance with computed and screen-film radiography: implications for chest radiography. , 1992, Radiology.

[95]  S. Overgaard,et al.  Observer variation in the radiographic classification of ankle fractures. , 1991, The Journal of bone and joint surgery. British volume.

[96]  J W Oestmann,et al.  Lung lesions: correlation between viewing time and detection. , 1988, Radiology.

[97]  K Berbaum,et al.  Influence of prior radiologic information on the interpretation of radiographic examinations. , 1995, Academic radiology.

[98]  G. Stevenson,et al.  Colorectal cancer overlooked at barium enema examination and colonoscopy: a continuing perceptual problem. , 1994, Radiology.

[99]  Calvin G. Barnes,et al.  Observer variation in grading sacroiliac radiographs might be a cause of 'sacroiliitis' reported in certain disease states. , 1987, Annals of the rheumatic diseases.

[100]  A. Jackson,et al.  Interobserver variation in the chest radiograph component of the lung injury score , 1995, Anaesthesia.

[101]  J Y Lo,et al.  Observer Evaluation of Scatter Subtraction for Digital Portable Chest Radiographs , 1993, Investigative radiology.

[102]  P. Driscoll,et al.  Accuracy of detection of radiographic abnormalities by junior doctors. , 1988, Archives of emergency medicine.

[103]  G. W. Gross,et al.  Neural networks in radiologic diagnosis. II. Interpretation of neonatal chest radiographs. , 1990, Investigative radiology.

[104]  C. Gerber,et al.  The reproducibility of classification of fractures of the proximal end of the humerus. , 1993, The Journal of bone and joint surgery. American volume.