Optimal Classification of Respiratory Patterns From Manual Analyses Using Expectation-Maximization

Manual scoring (MS) of cardiorespiratory signals is the <italic>gold standard</italic> method for the analysis of respiratory data in sleep laboratories. In MS, trained, expert scorers characterize respiratory patterns by scrolling through a data record and visually identifying patterns. However, MS is limited by high intra- and inter-scorer variability and subjectivity. A strategy to mitigate this is to analyze the same respiratory data multiple times and generate a consensus. This consensus is generally determined by a majority vote (MV), where the most frequent pattern is selected as the <italic>true pattern</italic>. This paper presents expectation-maximization pattern sequence (EM-PSEQ), a novel method based on EM that estimates the <italic>true patterns</italic> optimally. A simulation study examined the accuracies of EM-PSEQ, MV, and individual scorers (IS) as a function of the number of analyses. Accuracy was measured with the Fleiss <italic>κ</italic> statistic, and is reported as <inline-formula> <tex-math notation="LaTeX">$[{\kappa _{{\rm{MDN}}}} = x;{\kappa _{{\rm{P5}}}} = y]$</tex-math></inline-formula>, where <inline-formula><tex-math notation="LaTeX">${\kappa _{{\rm{MDN}}}}$</tex-math></inline-formula>, the median value, is the expected accuracy, and <inline-formula><tex-math notation="LaTeX">${\kappa _{{\rm{P5}}}}$</tex-math> </inline-formula>, the 5th percentile value, gives the minimum accuracy for 95% confidence. IS accuracy remained constant at <inline-formula><tex-math notation="LaTeX">$[{\kappa _{{\rm{MDN}}}} = 0.67;{\kappa _{{\rm{P5}}}} = 0.60]$ </tex-math></inline-formula> as the number of analyses increased. MV accuracy increased slowly with the number of analyses and plateaued at <inline-formula><tex-math notation="LaTeX">$[{\kappa _{{\rm{MDN}}}} = 0.78;{\kappa _{{\rm{P5}}}} = 0.76]$</tex-math></inline-formula> after five analyses. In contrast, EM-PSEQ accuracy improved quickly, reaching an <italic>almost perfect</italic> value of <inline-formula><tex-math notation="LaTeX">$[{\kappa _{{\rm{MDN}}}} = 0.83;{\kappa _{{\rm{P5}}}} = 0.77]$</tex-math></inline-formula> with four analyses, and <italic> perfect</italic> accuracy <inline-formula><tex-math notation="LaTeX">$[{\kappa _{{\rm{MDN}}}} = 1.00;{\kappa _{{\rm{P5}}}} = 0.99]$</tex-math></inline-formula> after 25 analyses. EM-PSEQ performed much better than either MV or IS, and required only modest computational effort. Consequently, we believe EM-PSEQ will be a very valuable tool for clinical studies, as it can dramatically improve the accuracy of manual respiratory analysis with minimal additional cost.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Atul Malhotra,et al.  Agreement in computer-assisted manual scoring of polysomnograms across sleep centers. , 2013, Sleep.

[4]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[5]  C. D. Kurth,et al.  Postoperative Apnea in Preterm Infants , 1987, Anesthesiology.

[6]  D. Steward Preterm infants are more prone to complications following minor surgery than are term infants. , 1983, Anesthesiology.

[7]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[8]  Ron Kikinis,et al.  Laboratory Investigation:Automatic Identification of Gray Matter Structures from MRI to Improve the Segmentation of White Matter Lesions , 1995 .

[9]  R. Kearney,et al.  Scoring Tools for the Analysis of Infant Respiratory Inductive Plethysmography Signals , 2015, PloS one.

[10]  Satish T. S. Bukkapatnam,et al.  Wireless Wearable Multisensory Suite and Real-Time Prediction of Obstructive Sleep Apnea Episodes , 2013, IEEE Journal of Translational Engineering in Health and Medicine.

[11]  C. D. Kurth,et al.  Postoperative Apnea in Former Preterm Infants: General Anesthesia or Spinal Anesthesia--Do We Have an Answer? , 2015, Anesthesiology.

[12]  Helena Chmura Kraemer,et al.  Measurement error in visually scored electrophysiological data: respiration during sleep , 1984, Journal of Neuroscience Methods.

[13]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[14]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[15]  S. Quan,et al.  Rules for scoring respiratory events in sleep: update of the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Deliberations of the Sleep Apnea Definitions Task Force of the American Academy of Sleep Medicine. , 2012, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[16]  N. Collop Scoring variability between polysomnography technologists in different sleep laboratories. , 2002, Sleep medicine.