Interrater and intrarater agreements of magnetic resonance imaging findings in the lumbar spine: significant variability across degenerative conditions.

BACKGROUND CONTEXT Magnetic resonance imaging (MRI) is frequently used in the evaluation of degenerative conditions in the lumbar spine. The relative interrater and intrarater agreements of MRI findings across different pathologic conditions are underexplored, as most studies are focused on specific findings. PURPOSE The purpose of this study was to characterize the interrater and intrarater agreements of MRI findings used to assess the degenerative lumbar spine. STUDY DESIGN A retrospective diagnostic study at a large academic medical center was undertaken with a panel of orthopedic surgeons and musculoskeletal radiologists to assess lumbar MRIs using standardized criteria. PATIENT SAMPLE Seventy-five subjects who underwent routine lumbar spine MRI at our institution were included. OUTCOME MEASURES Each MRI study was assessed for 10 lumbar degenerative findings using standardized criteria. Lumbar vertebral levels were assessed independently, where applicable, for a total of 52 data points collected per study. METHODS T2-weighted axial and sagittal MRI sequences were presented in random order to the four reviewers (two orthopedic spine surgeons and two musculoskeletal radiologists) independently to determine interrater agreement. The first 10 studies were reevaluated at the end to determine intrarater agreement. Images were assessed using standardized and pilot-tested criteria to assess disc degeneration, stenosis, and other degenerative changes. Interrater and intrarater absolute percent agreements were calculated. To highlight the most clinically important MRI disagreements, a modified agreement analysis was also performed (in which disagreements between the lowest two severity grades for applicable conditions were ignored). Fleiss kappa coefficients for interrater agreement were determined. RESULTS The overall absolute and modified interrater agreements were 76.9% and 93.5%, respectively. The absolute and modified intrarater agreements were 81.3% and 92.7%, respectively. Average Fleiss kappa coefficient was 0.431, suggesting moderate overall agreement. However, when stratified by condition, absolute interrater agreement ranged from 65.1% to 92.0%. Disc hydration, disc space height, and bone marrow changes exhibited the lowest absolute interrater agreements. The absolute intrarater agreement had a narrower range, from 74.5% to 91.5%. Fleiss kappa coefficients ranged from fair-to-substantial agreement (0.282-0.618). CONCLUSIONS Even in a study using standardized evaluation criteria, there was significant variability in the interrater and intrarater agreements of MRI in assessing different degenerative conditions of the lumbar spine. Clinicians should be aware of the condition-specific diagnostic limitations of MRI interpretation.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  Jan Stam,et al.  Observer variation in MRI evaluation of patients suspected of lumbar disk herniation. , 2005, AJR. American journal of roentgenology.

[3]  P C Milette,et al.  The proper terminology for reporting lumbar intervertebral disk disorders. , 1997, AJNR. American journal of neuroradiology.

[4]  A. Feydy,et al.  Reliability of a modified Modic classification of bone marrow changes in lumbar spine MRI. , 2009, Joint, bone, spine : revue du rhumatisme.

[5]  James N Weinstein,et al.  Reliability of Readings of Magnetic Resonance Imaging Features of Lumbar Spinal Stenosis , 2008, Spine.

[6]  J. Katz,et al.  Lumbar disc disorders and low-back pain: socioeconomic factors and consequences. , 2006, The Journal of bone and joint surgery. American volume.

[7]  Lutz Claes,et al.  Validity and interobserver agreement of a new radiographic grading system for intervertebral disc degeneration: Part I. Lumbar spine , 2006, European Spine Journal.

[8]  P. Bossuyt,et al.  Observer variation in the evaluation of lumbar herniated discs and root compression: spiral CT compared with MRI. , 2006, The British journal of radiology.

[9]  R. Pietrobon,et al.  Observer Variability in Assessing Lumbar Spinal Stenosis Severity on Magnetic Resonance Imaging and Its Relation to Cross-Sectional Spinal Canal Area , 2002, Spine.

[10]  J. Valty,et al.  Analysis of emergency department interpretation of electrocardiograms. , 1994, Journal of accident & emergency medicine.

[11]  P. Pynsent,et al.  1989 Volvo Award in Clinical Sciences: Reproducibility of Physical Signs in Low-Back Pain , 1989, Spine.

[12]  S. Yilmaz,et al.  Observer variability based on the strength of MR scanners in the assessment of lumbar degenerative disc disease. , 2004, European journal of radiology.

[13]  James N Weinstein,et al.  Lumbar spine: reliability of MR imaging findings. , 2009, Radiology.

[14]  N. Ashworth,et al.  Reliability of the Visual Assessment of Cervical and Lumbar Lordosis: How Good Are We? , 2003, Spine.

[15]  N. Obuchowski,et al.  Interobserver and Intraobserver Variability in Interpretation of Lumbar Disc Abnormalities: A Comparison of Two Nomenclatures , 1995, Spine.

[16]  R. Pietrobon,et al.  Intra- and inter-observer reliability of MRI examination of intervertebral disc abnormalities in patients with cervical myelopathy. , 2008, European journal of radiology.

[17]  N Houssami,et al.  Reader variability in reporting breast imaging according to BI-RADS assessment categories (the Florence experience). , 2006, Breast.

[18]  F. Kovacs,et al.  Agreement in the interpretation of magnetic resonance images of the lumbar spine , 2009, Acta radiologica.

[19]  H. Hricak,et al.  Improving Communication of Diagnostic Radiology Findings through Structured Reporting 1 , 2011 .

[20]  P. Kjaer,et al.  Intra- and interobserver reproducibility of vertebral endplate signal (Modic) changes in the lumbar spine: the nordic modic consensus group classification , 2007, Acta radiologica.

[21]  M. Gratton,et al.  Emergency department interpretation of electrocardiograms. , 1992, Annals of emergency medicine.

[22]  C. Pfirrmann,et al.  Magnetic Resonance Classification of Lumbar Intervertebral Disc Degeneration , 2001, Spine.

[23]  J. Bramble,et al.  Interobserver reliability in the interpretation of diagnostic lumbar MRI and nuclear imaging. , 2006, The spine journal : official journal of the North American Spine Society.

[24]  L. Cook,et al.  Interobserver and intraobserver variability in interpretation of lumbar disc abnormalities. A comparison of two nomenclatures. , 1995 .

[25]  B. Aarabi,et al.  Interobserver and Intraobserver Reliability of Maximum Canal Compromise and Spinal Cord Compression for Evaluation of Acute Traumatic Cervical Spinal Cord Injury , 2006, Spine.

[26]  M. Modic Degenerative disc disease and back pain. , 1999, Magnetic resonance imaging clinics of North America.