Interrater Reliability of Experts in Identifying Interictal Epileptiform Discharges in Electroencephalograms.

Importance The validity of using electroencephalograms (EEGs) to diagnose epilepsy requires reliable detection of interictal epileptiform discharges (IEDs). Prior interrater reliability (IRR) studies are limited by small samples and selection bias. Objective To assess the reliability of experts in detecting IEDs in routine EEGs. Design, Setting, and Participants This prospective analysis conducted in 2 phases included as participants physicians with at least 1 year of subspecialty training in clinical neurophysiology. In phase 1, 9 experts independently identified candidate IEDs in 991 EEGs (1 expert per EEG) reported in the medical record to contain at least 1 IED, yielding 87 636 candidate IEDs. In phase 2, the candidate IEDs were clustered into groups with distinct morphological features, yielding 12 602 clusters, and a representative candidate IED was selected from each cluster. We added 660 waveforms (11 random samples each from 60 randomly selected EEGs reported as being free of IEDs) as negative controls. Eight experts independently scored all 13 262 candidates as IEDs or non-IEDs. The 1051 EEGs in the study were recorded at the Massachusetts General Hospital between 2012 and 2016. Main Outcomes and Measures Primary outcome measures were percentage of agreement (PA) and beyond-chance agreement (Gwet κ) for individual IEDs (IED-wise IRR) and for whether an EEG contained any IEDs (EEG-wise IRR). Secondary outcomes were the correlations between numbers of IEDs marked by experts across cases, calibration of expert scoring to group consensus, and receiver operating characteristic analysis of how well multivariate logistic regression models may account for differences in the IED scoring behavior between experts. Results Among the 1051 EEGs assessed in the study, 540 (51.4%) were those of females and 511 (48.6%) were those of males. In phase 1, 9 experts each marked potential IEDs in a median of 65 (interquartile range [IQR], 28-332) EEGs. The total number of IED candidates marked was 87 636. Expert IRR for the 13 262 individually annotated IED candidates was fair, with the mean PA being 72.4% (95% CI, 67.0%-77.8%) and mean κ being 48.7% (95% CI, 37.3%-60.1%). The EEG-wise IRR was substantial, with the mean PA being 80.9% (95% CI, 76.2%-85.7%) and mean κ being 69.4% (95% CI, 60.3%-78.5%). A statistical model based on waveform morphological features, when provided with individualized thresholds, explained the median binary scores of all experts with a high degree of accuracy of 80% (range, 73%-88%). Conclusions and Relevance This study's findings suggest that experts can identify whether EEGs contain IEDs with substantial reliability. Lower reliability regarding individual IEDs may be largely explained by various experts applying different thresholds to a common underlying statistical model.

[1]  J Gotman,et al.  Comparison of traditional reading of the EEG and automatic recognition of interictal epileptic activity. , 1978, Electroencephalography and clinical neurophysiology.

[2]  H. García,et al.  Epilepsy in poor regions of the world , 2012, The Lancet.

[3]  W E Hostetler,et al.  Assessment of a computer program to detect epileptiform spikes. , 1992, Electroencephalography and clinical neurophysiology.

[4]  Ronald G. Emerson,et al.  Spike detection II: automatic, perception-based detection and clustering , 1999, Clinical Neurophysiology.

[5]  A. A. Dingle,et al.  Real-time Detection of Epileptiform Activity in the EEG: A Blinded Clinical Trial , 2000, Clinical EEG.

[6]  P Guedes de Oliveira,et al.  Spike detection based on a pattern recognition approach using a microcomputer. , 1983, Electroencephalography and clinical neurophysiology.

[7]  Selim R. Benbadis,et al.  Errors in EEG Interpretation and Misdiagnosis of Epilepsy , 2008, European Neurology.

[8]  T. Loddenkemper,et al.  Continuous Spikes and Waves during Sleep: Electroclinical Presentation and Suggestions for Management , 2013, Epilepsy research and treatment.

[9]  C. V. van Donselaar,et al.  Value of the electroencephalogram in adult patients with untreated idiopathic first seizures. , 1992, Archives of neurology.

[10]  W. Tatum How not to read an EEG , 2013, Neurology.

[11]  D. Gilbert Interobserver reliability of visual interpretation of electroencephalograms in children with newly diagnosed seizures. , 2006, Developmental medicine and child neurology.

[12]  K. Gwet Computing inter-rater reliability and its variance in the presence of high agreement. , 2008, The British journal of mathematical and statistical psychology.

[13]  Panayiotopoulos Cp,et al.  The Epilepsies: Seizures, Syndromes and Management , 2004 .

[14]  P. Mathys,et al.  Spike detection algorithm automatically adapted to individual patients applied to spike and wave percentage quantification , 2009, Neurophysiologie Clinique/Clinical Neurophysiology.

[15]  M. Westover,et al.  Interrater agreement for Critical Care EEG Terminology , 2014, Epilepsia.

[16]  J. Gotman,et al.  State dependent spike detection: validation. , 1992, Electroencephalography and clinical neurophysiology.

[17]  J. Halford,et al.  What it should mean for an algorithm to pass a statistical Turing test for detection of epileptiform discharges , 2017, Clinical Neurophysiology.

[18]  Eamonn J. Keogh,et al.  Rapid annotation of interictal epileptiform discharges via template matching under Dynamic Time Warping , 2016, Journal of Neuroscience Methods.

[19]  W. Hauser,et al.  Epilepsy in the developing world , 2009, Current neurology and neuroscience reports.

[20]  Martin A. Green,et al.  Reliability of Clinical Interpretation of the Electroencephalogram , 1975 .

[21]  C. Binnie,et al.  Modern electroencephalography: its role in epilepsy management , 1999, Clinical Neurophysiology.

[22]  S. Benbadis “Just like EKGs!” Should EEGs undergo a confirmatory interpretation by a clinical neurophysiologist? , 2013, Neurology.

[23]  Giovanni Pellegrino,et al.  Source localization of the seizure onset zone from ictal EEG/MEG data , 2016, Human brain mapping.

[24]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[25]  F. Duffy,et al.  Spike detection. I. Correlation and reliability of human experts. , 1996, Electroencephalography and clinical neurophysiology.

[26]  Stefan Seidel,et al.  Incidental epileptiform discharges in patients of a tertiary centre , 2016, Clinical Neurophysiology.

[27]  S. Smith EEG in the diagnosis, classification, and management of patients with epilepsy , 2005, Journal of Neurology, Neurosurgery & Psychiatry.

[28]  D. Rennie,et al.  Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative , 2003, Annals of Internal Medicine.

[29]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[30]  Arthur C. Grant,et al.  EEG interpretation reliability and interpreter confidence: A large single-center study , 2014, Epilepsy & Behavior.

[31]  S. Beniczky,et al.  Clinical utility of EEG in diagnosing and monitoring epilepsy in adults , 2018, Clinical Neurophysiology.

[32]  N. Fountain,et al.  EEG Is an Essential Clinical Tool: Pro and Con , 2006, Epilepsia.

[33]  Kate M Daniello,et al.  Education Research: The current state of neurophysiology education in selected neurology residency programs , 2018, Neurology.

[34]  M. Libenson Practical Approach to Electroencephalography , 2009 .

[35]  Chad G. Waters,et al.  Interictal Epileptiform Discharge Detection in EEG in Different Practice Settings , 2018, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[36]  R. Homan,et al.  Cerebral location of international 10-20 system electrode placement. , 1987, Electroencephalography and clinical neurophysiology.

[37]  William B. S. Pressly,et al.  Web-Based Collection of Expert Opinion on Routine Scalp EEG: Software Development and Interrater Reliability , 2011, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[38]  G. Barkley,et al.  MEG and EEG in Epilepsy , 2003, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[39]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[40]  A. Mahajan,et al.  Neurology residency training in 2017 , 2018, Neurology.

[41]  William O Tatum,et al.  Overintepretation of EEGs and Misdiagnosis of Epilepsy , 2003, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[42]  Robert J. Schalkoff,et al.  Standardized database development for EEG epileptiform transient detection: EEGnet scoring system and machine learning analysis , 2013, Journal of Neuroscience Methods.

[43]  I. Bankman,et al.  Automatic EEG spike detection: what should the computer imitate? , 1993, Electroencephalography and clinical neurophysiology.

[44]  J W Whisler,et al.  Machine detection of spike-wave activity in the EEG and its accuracy compared with visual interpretation. , 1982, Electroencephalography and clinical neurophysiology.

[45]  Scott B. Wilson,et al.  Spike detection: Inter-reader agreement and a statistical Turing test on a large data set , 2017, Clinical Neurophysiology.

[46]  Josemir W Sander,et al.  Premature mortality of epilepsy in low- and middle-income countries: A systematic review from the Mortality Task Force of the International League Against Epilepsy , 2016, Epilepsia.

[47]  Justin Dauwels,et al.  Interictal epileptiform discharge characteristics underlying expert interrater agreement , 2017, Clinical Neurophysiology.

[48]  H. Lüders,et al.  Interobserver variability in EEG interpretation , 1985, Neurology.

[49]  J K Penry,et al.  Computer recognition of generalized spike-wave discharges. , 1976, Electroencephalography and clinical neurophysiology.

[50]  R. J. Ellingson,et al.  On the reliability of clinical EEG interpretation. , 1959, Journal of Nervous and Mental Disease.

[51]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .

[52]  Giridhar P Kalamangalam,et al.  Characteristics of EEG Interpreters Associated With Higher Interrater Agreement , 2017, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[53]  Hiroshi Shibasaki,et al.  A revised glossary of terms most commonly used by clinical electroencephalographers and updated proposal for the report format of the EEG findings. Revision 2017 , 2017, Clinical neurophysiology practice.