Feature Selection Algorithm based on Random Forest applied to Sleep Apnea Detection

This paper presents a new feature selection method based on the changes in out-of-bag (OOB) Cohen kappa values of a random forest (RF) classifier, which was tested on the automatic detection of sleep apnea based on the oxygen saturation signal (SpO2). The feature selection method is based on the RF predictor importance defined as the increase in error when features are permuted. This method is improved by changing the classification error into the Cohen kappa value, by adding an extra factor to avoid correlated features and by adapting the OOB sample selection to obtain a patient independent validation. When applying the method for sleep apnea classification, an optimal feature set of 3 parameters was selected out of 286. This was half of the 6 features that were obtained in our previous study. This feature reduction resulted in an improved interpretability of our model, but also a slight decrease in performance, without affecting the clinical screening performance. Feature selection is an important issue in machine learning and especially biomedical informatics. This new feature selection method introduces interesting improvements of RF feature selection methods, which can lead to a reduced feature set and an improved classifier interpretability.

[1]  J. Floras,et al.  Obstructive sleep apnoea and its cardiovascular consequences , 2009, The Lancet.

[2]  Arie Ben-David,et al.  Comparison of classification accuracy using Cohen's Weighted Kappa , 2008, Expert Syst. Appl..

[3]  S. Quan,et al.  Rules for scoring respiratory events in sleep: update of the 2007 AASM Manual for the Scoring of Sleep and Associated Events. Deliberations of the Sleep Apnea Definitions Task Force of the American Academy of Sleep Medicine. , 2012, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Sabine Van Huffel,et al.  Sleep Apnea Detection Using Pulse Photoplethysmography , 2018, 2018 Computing in Cardiology Conference (CinC).

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Sabine Van Huffel,et al.  Automatic Screening of Sleep Apnea Patients Based on the SpO2 Signal , 2019, IEEE Journal of Biomedical and Health Informatics.

[9]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[10]  J. Samet,et al.  The Sleep Heart Health Study: design, rationale, and methods. , 1997, Sleep.

[11]  Jesús Lázaro,et al.  Pulse Rate Variability Analysis for Discrimination of Sleep-Apnea-Related Decreases in the Amplitude Fluctuations of Pulse Photoplethysmographic Signal in Children , 2014, IEEE Journal of Biomedical and Health Informatics.

[12]  Bonnie K. Lind,et al.  Methods for obtaining and analyzing unattended polysomnography data for a multicenter study. Sleep Heart Health Research Group. , 1998, Sleep.