Feature Selection Strategy for Intrahospital Mortality Prediction after Coronary Artery Bypass Graft Surgery on an Unbalanced Sample

The aim of the study is to develop models of intrahospital mortality (IHM) prediction on an unbalanced sample of patients with coronary artery disease (CAD) post coronary artery bypass graft (CABG) surgery. Methods. Models for IHM prediction were built following the analysis of 866 electronic case histories based on the analysis of CAD patients, revascularized with the CABG operation. The patient cohort consisted of two groups. The first included 35 (4%) patients who died within the first 30 days after CABG, the second - 831 (96%) patients with a favorable operation outcome. We analyzed 99 factors, including the results of clinical, laboratory and instrumental studies obtained before CABG. For feature compilation, classical filtering and model selection methods were used (wrapper method). The primary drawback to applying a classical approach was the unbalanced sample as one cohort only consisted of 4% of subjects. In that case, it was not possible to apply the cross-validation procedure with three types of samples, standard quality metrics and multi-category factors. Results. Features searching approach using the multi-stage selection procedure, which combined the validation of predefined predictors, filtering methods and multifactor model development based on logistic regression, random forest (RF) and artificial neural networks (ANNs) was proposed. The models' accuracy was evaluated by a combined quality metric. RF and ANNs based models allowed not only to build more accurate forecasting tools, but also assisted in verifying five additional IHM predictors.

[1]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[2]  Francesca N. Delling,et al.  Heart Disease and Stroke Statistics—2019 Update: A Report From the American Heart Association , 2019, Circulation.

[3]  T. Kato,et al.  Comparison of modern risk scores in predicting operative mortality for patients undergoing aortic valve replacement for aortic stenosis. , 2016, Journal of cardiology.

[4]  C. Gallagher Extending the Linear Model With R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2007 .

[5]  G. Nickenig,et al.  The revised EuroSCORE II for the prediction of mortality in patients undergoing transcatheter aortic valve implantation , 2013, Clinical Research in Cardiology.

[6]  Constantin F. Aliferis,et al.  A gentle introduction to support vector machines in biomedicine: Volume 1: Theory and methods , 2011 .

[7]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[8]  Yuansheng Yang,et al.  A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure , 2019, BioMed research international.

[9]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  H. Becher,et al.  Left Ventricular Relative Wall Thickness Versus Left Ventricular Mass Index in Non-Cardioembolic Stroke Patients , 2015, Medicine.

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  AbdiHervé,et al.  Principal Component Analysis , 2010, Essentials of Pattern Recognition.

[14]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Jinye Peng,et al.  Dimensionality reduction method based on a tensor model , 2017 .

[18]  C. Reid,et al.  Predicting long-term survival after coronary artery bypass graft surgery. , 2018, Interactive cardiovascular and thoracic surgery.

[19]  Duo-qing Wu,et al.  Comparison Between UMAP and t-SNE for Multiplex-Immunofluorescence Derived Single-Cell Data from Tissue Sections , 2019, bioRxiv.

[20]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[21]  S. Solomon,et al.  Prognostic implications of left ventricular mass and geometry following myocardial infarction: the VALIANT (VALsartan In Acute myocardial iNfarcTion) Echocardiographic Study. , 2008, JACC. Cardiovascular imaging.

[22]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[23]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[24]  D. Chicco,et al.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation , 2020, BMC Genomics.

[25]  Samer A M Nashef,et al.  EuroSCORE II. , 2012, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[26]  Amparo Alonso-Betanzos,et al.  A Wrapper Method for Feature Selection in Multiple Classes Datasets , 2009, IWANN.

[27]  A. Khera,et al.  2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. , 2019, Journal of the American College of Cardiology.

[28]  J. Rouleau,et al.  Sudden Cardiac Death in Patients With Ischemic Heart Failure Undergoing Coronary Artery Bypass Grafting: Results From the STICH Randomized Clinical Trial (Surgical Treatment for Ischemic Heart Failure) , 2017, Circulation.