Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring

Abstract Study Objectives To quantify the amount of sleep stage ambiguity across expert scorers and to validate a new auto-scoring platform against sleep staging performed by multiple scorers. Methods We applied a new auto-scoring system to three datasets containing 95 PSGs scored by 6–12 scorers, to compare sleep stage probabilities (hypnodensity; i.e. the probability of each sleep stage being assigned to a given epoch) as the primary output, as well as a single sleep stage per epoch assigned by hierarchical majority rule. Results The percentage of epochs with 100% agreement across scorers was 46 ± 9%, 38 ± 10% and 32 ± 9% for the datasets with 6, 9, and 12 scorers, respectively. The mean intra-class correlation coefficient between sleep stage probabilities from auto- and manual-scoring was 0.91, representing excellent reliability. Within each dataset, agreement between auto-scoring and consensus manual-scoring was significantly higher than agreement between manual-scoring and consensus manual-scoring (0.78 vs. 0.69; 0.74 vs. 0.67; and 0.75 vs. 0.67; all p < 0.01). Conclusions Analysis of scoring performed by multiple scorers reveals that sleep stage ambiguity is the rule rather than the exception. Probabilities of the sleep stages determined by artificial intelligence auto-scoring provide an excellent estimate of this ambiguity. Compared to consensus manual-scoring, sleep staging derived from auto-scoring is for each individual PSG noninferior to manual-scoring meaning that auto-scoring output is ready for interpretation without the need for manual adjustment.

[1]  Yun Li,et al.  A meta-analysis of the first-night effect in healthy individuals for the full age spectrum. , 2021, Sleep medicine.

[2]  M. Walker,et al.  An open-source, high-performance tool for automated sleep staging , 2021, eLife.

[3]  G. Clifford,et al.  Boosting Automated Sleep Staging Performance in Big Datasets using Population Sub-grouping. , 2021, Sleep.

[4]  C. Igel,et al.  U-Sleep: resilient high-frequency sleep staging , 2021, npj Digital Medicine.

[5]  P. Anderer,et al.  Estimating sleep stages using cardiorespiratory signals: validation of a novel algorithm across a wide range of sleep-disordered breathing severity. , 2021, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[6]  H. Völzke,et al.  Inter-rater sleep stage scoring reliability between manual scoring from two European sleep centers and automatic scoring performed by the artificial intelligence-based Stanford-STAGES algorithm. , 2021, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[7]  Diego Alvarez-Estevez,et al.  Inter-database validation of a deep learning approach for automatic sleep scoring , 2020, PloS one.

[8]  Alexander Neergaard Olesen,et al.  Automatic sleep stage classification with deep residual networks in a mixed-cohort setting , 2020, Sleep.

[9]  Chen Chen,et al.  A Hierarchical Neural Network for Sleep Stage Classification Based on Comprehensive Feature Learning and Multi-Flow Sequence Learning , 2020, IEEE Journal of Biomedical and Health Informatics.

[10]  P. Anderer,et al.  Automatic sleep staging using heart rate variability, body movements, and recurrent neural networks in a sleep disordered population. , 2020, Sleep.

[11]  Timo Leppänen,et al.  Accurate Deep Learning-Based Sleep Staging in a Clinical Population With Suspected Obstructive Sleep Apnea , 2019, IEEE Journal of Biomedical and Health Informatics.

[12]  V. Thorey,et al.  Dreem Open Datasets: Multi-Scored Sleep Datasets to Compare Human and Automated Sleep Staging , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[13]  Ronald M. Aarts,et al.  Sleep stage classification from heart-rate variability using long short-term memory neural networks , 2019, Scientific Reports.

[14]  David Kent,et al.  Automated Sleep Stage Scoring of the Sleep Heart Health Study Using Deep Neural Networks. , 2019, Sleep.

[15]  Ying Zhang,et al.  Interrater agreement between American and Chinese sleep centers according to the 2014 AASM standard , 2019, Sleep and Breathing.

[16]  Haoqi Sun,et al.  Expert-level sleep scoring with deep neural networks , 2018, J. Am. Medical Informatics Assoc..

[17]  Oliver Y. Chén,et al.  SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging , 2018, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[18]  Barry Peterson,et al.  Detection of Nocturnal Scratching Movements in Patients with Atopic Dermatitis Using Accelerometers and Recurrent Neural Networks , 2018, IEEE Journal of Biomedical and Health Informatics.

[19]  Ju Lynn Ong,et al.  An end-to-end framework for real-time automatic sleep stage classification , 2018, Sleep.

[20]  Allan I Pack,et al.  Reliability of the American Academy of Sleep Medicine Rules for Assessing Sleep Depth in Clinical Practice. , 2018, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[21]  Dimitri Perrin,et al.  Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy , 2017, Nature Communications.

[22]  Olga Sourina,et al.  Large-Scale Automated Sleep Staging , 2017, Sleep.

[23]  Chao Wu,et al.  DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[24]  Mohammed Imamul Hassan Bhuiyan,et al.  Automated identification of sleep states from EEG signals by means of ensemble empirical mode decomposition and random under sampling boosting , 2017, Comput. Methods Programs Biomed..

[25]  A. Hassan,et al.  A decision support system for automatic sleep staging from EEG signals using tunable Q-factor wavelet transform and spectral features , 2016, Journal of Neuroscience Methods.

[26]  K. Flegal,et al.  Anthropometric Reference Data for Children and Adults: United States, 2011-2014. , 2016, Vital and health statistics. Series 3, Analytical studies.

[27]  P. Hanly,et al.  Staging Sleep in Polysomnograms: Analysis of Inter-Scorer Variability. , 2016, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[28]  Terry K Koo,et al.  A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. , 2016, Journal Chiropractic Medicine.

[29]  Georg Dorffner,et al.  Computer-Assisted Automated Scoring of Polysomnograms Using the Somnolyzer System. , 2015, Sleep.

[30]  Karim Jerbi,et al.  Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines , 2015, Journal of Neuroscience Methods.

[31]  Atul Malhotra,et al.  Agreement in computer-assisted manual scoring of polysomnograms across sleep centers. , 2013, Sleep.

[32]  A. Pack,et al.  Performance of an automated polysomnography scoring system versus computer-assisted manual scoring. , 2013, Sleep.

[33]  Thomas Penzel,et al.  Agreement in the scoring of respiratory events and sleep among international sleep centers. , 2013, Sleep.

[34]  Yu-Liang Hsu,et al.  Automatic sleep stage recurrent neural classifier using energy features of EEG signals , 2013, Neurocomputing.

[35]  R. Rosenberg,et al.  The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. , 2013, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[36]  A. Newman,et al.  The Impact of Sleep-Disordered Breathing on Body Mass Index (BMI): The Sleep Heart Health Study (SHHS). , 2011, Southwest journal of pulmonary & critical care.

[37]  P. Anderer,et al.  Computer-Assisted Sleep Classification according to the Standard of the American Academy of Sleep Medicine : Validation Study of the AASM Version of the Somnolyzer 24 ! 7 , 2010 .

[38]  P. Anderer,et al.  Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard , 2009, Journal of sleep research.

[39]  S. Chokroverty,et al.  The visual scoring of sleep in adults. , 2007, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[40]  A. Schlögl,et al.  An E-Health Solution for Automatic Sleep Classification according to Rechtschaffen and Kales: Validation Study of the Somnolyzer 24 × 7 Utilizing the Siesta Database , 2005, Neuropsychobiology.

[41]  A. Malhotra,et al.  Assessment of automated scoring of polysomnographic recordings in a population with suspected sleep-disordered breathing. , 2004, Sleep.

[42]  A. Varri,et al.  The SIESTA project polygraphic and clinical database , 2001, IEEE Engineering in Medicine and Biology Magazine.

[43]  M. Vitiello,et al.  C STAGE, automated sleep scoring: development and comparison with human sleep scoring for healthy older men and women. , 1994, Sleep.

[44]  S. Kubicki,et al.  Sleep EEG evaluation: a comparison of results obtained by visual scoring and automatic analysis with the Oxford sleep stager. , 1989, Sleep.

[45]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[46]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[47]  J R Smith,et al.  EEG sleep stage scoring by an automatic hybrid system. , 1971, Electroencephalography and clinical neurophysiology.

[48]  E. Wolpert A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects. , 1969 .

[49]  S. Cash,et al.  Expert-level automated sleep staging of long-term scalp electroencephalography recordings using deep learning , 2020 .

[50]  M. Hirshkowitz,et al.  Monitoring and Staging Human Sleep , 2013 .

[51]  A. Chesson,et al.  The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications , 2007 .

[52]  W. Herrmann,et al.  On the use of neural network techniques to analyze sleep EEG data. Third communication: robustification of the classificator by applying an algorithm obtained from 9 different networks. , 1998, Neuropsychobiology.

[53]  A. Muzet,et al.  Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. , 1996, Sleep.

[54]  R. Hoffmann,et al.  Quantitative description of sleep stage electrophysiology using digital period analytic techniques. , 1984, Sleep.