EEG interpretation reliability and interpreter confidence: A large single-center study

The intrarater and interrater reliability (I&IR) of EEG interpretation has significant implications for the value of EEG as a diagnostic tool. We measured both the intrarater reliability and the interrater reliability of EEG interpretation based on the interpretation of complete EEGs into standard diagnostic categories and rater confidence in their interpretations and investigated sources of variance in EEG interpretations. During two distinct time intervals, six board-certified clinical neurophysiologists classified 300 EEGs into one or more of seven diagnostic categories and assigned a subjective confidence to their interpretations. Each EEG was read by three readers. Each reader interpreted 150 unique studies, and 50 studies were re-interpreted to generate intrarater data. A generalizability study assessed the contribution of subjects, readers, and the interaction between subjects and readers to interpretation variance. Five of the six readers had a median confidence of ≥99%, and the upper quartile of confidence values was 100% for all six readers. Intrarater Cohen's kappa (κc) ranged from 0.33 to 0.73 with an aggregated value of 0.59. Cohen's kappa ranged from 0.29 to 0.62 for the 15 reader pairs, with an aggregated Fleiss kappa of 0.44 for interrater agreement. Cohen's kappa was not significantly different across rater pairs (chi-square=17.3, df=14, p=0.24). Variance due to subjects (i.e., EEGs) was 65.3%, due to readers was 3.9%, and due to the interaction between readers and subjects was 30.8%. Experienced epileptologists have very high confidence in their EEG interpretations and low to moderate I&IR, a common paradox in clinical medicine. A necessary, but insufficient, condition to improve EEG interpretation accuracy is to increase intrarater and interrater reliability. This goal could be accomplished, for instance, with an automated online application integrated into a continuing medical education module that measures and reports EEG I&IR to individual users.

[1]  T S Walczak,et al.  Accuracy and interobserver reliability of scalp ictal EEG , 1992, Neurology.

[2]  H. Kraemer,et al.  Interrater reliability of EEG-video monitoring , 2009, Neurology.

[3]  D. Treiman,et al.  Interobserver Agreement in the Interpretation of EEG Patterns in Critically Ill Adults , 2008, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[4]  Ettore Beghi,et al.  Inter-rater reliability of the EEG reading in patients with childhood idiopathic epilepsy , 2005, Epilepsy Research.

[5]  E. Berner,et al.  Overconfidence as a cause of diagnostic error in medicine. , 2008, The American journal of medicine.

[6]  Scott B. Wilson,et al.  Seizure detection: correlation of human experts , 2003, Clinical Neurophysiology.

[7]  R. Woody Intra-judge reliability in clinical electroencephalography. , 1966, Journal of clinical psychology.

[8]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[9]  Sampsa Vanhatalo,et al.  Detection of ‘EEG bursts’ in the early preterm EEG: Visual vs. automated detection , 2010, Clinical Neurophysiology.

[10]  C. Stam,et al.  Inter-observer variability of the EEG diagnosis of seizures in comatose patients , 2009, Seizure.

[11]  Jeffrey A. Loeb,et al.  High inter-reviewer variability of spike detection on intracranial EEG addressed by an automated multi-channel algorithm , 2012, Clinical Neurophysiology.

[12]  D. Spencer,et al.  Reliability and accuracy of localization by scalp ictal EEG , 1985, Neurology.

[13]  S. Little,et al.  INTRA‐RATER RELIABILITY OF EEG INTERPRETATIONS , 1962, The Journal of nervous and mental disease.

[14]  Lawrence J. Hirsch,et al.  Interrater Reliability of ICU EEG Research Terminology , 2012, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[15]  J. Spatt,et al.  Reliability of automatic and visual analysis of interictal spikes in lateralising an epileptic focus during video-EEG monitoring. , 1997, Electroencephalography and clinical neurophysiology.

[16]  R. Emerson,et al.  The ACNS Subcommittee on Research Terminology for Continuous EEG Monitoring: Proposed Standardized Terminology for Rhythmic and Periodic EEG Patterns Encountered in Critically Ill Patients , 2005, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[17]  D. Gilbert Interobserver reliability of visual interpretation of electroencephalograms in children with newly diagnosed seizures. , 2006, Developmental medicine and child neurology.

[18]  S. Fujimoto,et al.  An intervention to improve the interrater reliability of clinical EEG interpretations , 2003, Psychiatry and clinical neurosciences.

[19]  Geoff Norman,et al.  Overconfidence in clinical decision making. , 2008, The American journal of medicine.

[20]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[21]  Eric Marsh,et al.  Interobserver Reproducibility of Electroencephalogram Interpretation in Critically Ill Children , 2011, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[22]  William B. S. Pressly,et al.  Web-Based Collection of Expert Opinion on Routine Scalp EEG: Software Development and Interrater Reliability , 2011, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.