Interrater agreement between American and Chinese sleep centers according to the 2014 AASM standard

ObjectivesTo determine inter-lab reliability in sleep stage scoring using the 2014 American Academy of Sleep Medicine (AASM) manual. To understand in-depth reasons for disagreement and provide suggestions for improvement.MethodsThis study consisted of 40 all-night polysomnographys (PSGs) from different samples. PSGs were segmented into 37,642 30-s epochs. Five doctors from China and two doctors from America scored the epochs following the 2014 AASM standard. Scoring disagreement between two centers was evaluated using Cohen’s kappa (κ). After visual inspection of PSGs of deviating scorings, potential disagreement reasons were analyzed.ResultsInter-lab reliability yielded a substantial degree (κ = 0.75 ± 0.01). Scoring for stage W (κ = 0.89) and R (κ = 0.87) achieved the highest agreement, while stage N1 (κ = 0.45) reflected the lowest. Considering the relative disagreement ratio, N2-N3 (22.09%), W-N1 (19.68%), and N1-N2 (18.75%) were the most frequent combinations of discrepancy. American and Chinese doctors showed certain characteristics in the scoring of discrepancy combination W-N1, N1-N2, and N2-N3. There are seven reasons for disagreement, namely “on-threshold characteristic” (29.21%), “context influence” (18.06%), “characteristic identification difficulty” (8.81%), “arousal-wake confusion” (7.57%), “derivation inconsistence” (2.15%), “on-borderline characteristic” (0.92%), and “misrecognition” (33.27%).ConclusionsThis study demonstrated the sleep stage scoring agreement of the 2014 AASM manual and explored potential sources of labeling ambiguity. Improvement measures were suggested accordingly to help remove ambiguity for scorers and improve scoring reliability at the international level.

[1]  S. Chokroverty,et al.  The visual scoring of sleep in adults. , 2007, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[2]  J. Allan Hobson,et al.  A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects: A. Rechtschaffen and A. Kales (Editors). (Public Health Service, U.S. Government Printing Office, Washington, D.C., 1968, 58 p., $4.00) , 1969 .

[3]  S. Himanen,et al.  Limitations of Rechtschaffen and Kales. , 2000, Sleep medicine reviews.

[4]  Thomas Penzel,et al.  Inter-rater agreement in sleep stage classification between centers with different backgrounds , 2008 .

[5]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[6]  Max Hirshkowitz,et al.  Normal human sleep: an overview. , 2004, The Medical clinics of North America.

[7]  M. Carskadon,et al.  Chapter 2 - Normal Human Sleep : An Overview , 2005 .

[8]  Thomas Penzel,et al.  Process and outcome for international reliability in sleep scoring , 2015, Sleep and Breathing.

[9]  R. Rosenberg,et al.  The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. , 2013, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[10]  Kazuoki Kodera,et al.  Discrepancy in polysomnography scoring for a patient with obstructive sleep apnea hypopnea syndrome. , 2005, The Tohoku journal of experimental medicine.

[11]  A. Schlögl,et al.  Interrater reliability between scorers from eight European sleep laboratories in subjects with different sleep disorders , 2004, Journal of sleep research.

[12]  A. Paul Hare Consensus Versus Majority Vote , 1980 .

[13]  A. Chesson,et al.  The American Academy of Sleep Medicine (AASM) Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications , 2007 .

[14]  P. Anderer,et al.  Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard , 2009, Journal of sleep research.

[15]  Bronwyn Stevens,et al.  The 2007 AASM recommendations for EEG electrode placement in polysomnography: impact on sleep and cortical arousal scoring. , 2011, Sleep.

[16]  A. Rechtschaffen,et al.  A manual of standardized terminology, technique and scoring system for sleep stages of human subjects , 1968 .

[17]  S. Redline,et al.  Reliability of scoring respiratory disturbance indices and sleep staging. , 1998, Sleep.

[18]  L. Parrino,et al.  Commentary from the Italian Association of Sleep Medicine on the AASM manual for the scoring of sleep and associated events: for debate and discussion. , 2009, Sleep medicine.

[19]  Thomas Penzel,et al.  Agreement in the scoring of respiratory events and sleep among international sleep centers. , 2013, Sleep.

[20]  A. Chesson,et al.  The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications , 2007 .

[21]  Thomas Penzel,et al.  Inter-scorer reliability between sleep centers can teach us what to improve in the scoring rules. , 2013, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[22]  B. Högl,et al.  Sleep and Respiration in 100 Healthy Caucasian Sleepers--A Polysomnographic Study According to American Academy of Sleep Medicine Standards. , 2015, Sleep.