The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation

Objective: Although many clinical metrics are associated with proximity to decompensation in heart failure (HF), none are individually accurate enough to risk-stratify HF patients on a patient-by-patient basis. The dire consequences of this inaccuracy in risk stratification have profoundly lowered the clinical threshold for application of high-risk surgical intervention, such as ventricular assist device placement. Machine learning can detect non-intuitive classifier patterns that allow for innovative combination of patient feature predictive capability. A machine learning-based clinical tool to identify proximity to catastrophic HF deterioration on a patient-specific basis would enable more efficient direction of high-risk surgical intervention to those patients who have the most to gain from it, while sparing others. Synthetic electronic health record (EHR) data are statistically indistinguishable from the original protected health information, and can be analyzed as if they were original data but without any privacy concerns. We demonstrate that synthetic EHR data can be easily accessed and analyzed and are amenable to machine learning analyses. Methods: We developed synthetic data from EHR data of 26,575 HF patients admitted to a single institution during the decade ending on 12/31/2018. Twenty-seven clinically-relevant features were synthesized and utilized in supervised deep learning and machine learning algorithms (i.e., deep neural networks [DNN], random forest [RF], and logistic regression [LR]) to explore their ability to predict 1-year mortality by five-fold cross validation methods. We conducted analyses leveraging features from prior to/at and after/at the time of HF diagnosis. Results: The area under the receiver operating curve (AUC) was used to evaluate the performance of the three models: the mean AUC was 0.80 for DNN, 0.72 for RF, and 0.74 for LR. Age, creatinine, body mass index, and blood pressure levels were especially important features in predicting death within 1-year among HF patients. Conclusions: Machine learning models have considerable potential to improve accuracy in mortality prediction, such that high-risk surgical intervention can be applied only in those patients who stand to benefit from it. Access to EHR-based synthetic data derivatives eliminates risk of exposure of EHR data, speeds time-to-insight, and facilitates data sharing. As more clinical, imaging, and contractile features with proven predictive capability are added to these models, the development of a clinical tool to assist in timing of intervention in surgical candidates may be possible.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[3]  Hyun-Jai Cho,et al.  Artificial intelligence algorithm for predicting mortality of patients with acute heart failure , 2019, PloS one.

[4]  Philip R. O. Payne,et al.  Heart Failure Diagnosis, Readmission, and Mortality Prediction Using Machine Learning and Artificial Intelligence Models , 2020, Current Epidemiology Reports.

[5]  G. Tomaselli,et al.  What Causes Sudden Death in Heart Failure? , 2004, Circulation research.

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Philip R. O. Payne,et al.  Are Synthetic Data Derivatives the Future of Translational Medicine? , 2018, JACC. Basic to translational science.

[8]  Laura A. Levit,et al.  Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press , 2009 .

[9]  K. Dimopoulos,et al.  Common long-term complications of adult congenital heart disease: avoid falling in a H.E.A.P. , 2016, Expert review of cardiovascular therapy.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  D. Hosmer,et al.  Model‐Building Strategies and Methods for Logistic Regression , 2005 .

[12]  D. Mann,et al.  Epidemiology, pathophysiology and clinical outcomes for heart failure patients with a mid‐range ejection fraction , 2017, European journal of heart failure.

[13]  Daniel Rueckert,et al.  Deep learning cardiac motion analysis for human survival prediction , 2018, Nature Machine Intelligence.

[14]  H. Krumholz,et al.  Machine Learning Prediction of Mortality and Hospitalization in Heart Failure with Preserved Ejection Fraction. , 2020, JACC. Heart failure.

[15]  Mathieu Bauchy,et al.  Machine learning for glass science and engineering: A review , 2019, Journal of Non-Crystalline Solids.

[16]  S. Anker,et al.  Three year mortality in heart failure patients with very low left ventricular ejection fractions. , 1999, International journal of cardiology.

[17]  Ajinkya C. Inamdar,et al.  Heart Failure: Diagnosis, Management and Utilization , 2016, Journal of clinical medicine.

[18]  Mohammed Bennamoun,et al.  Machine learning‐based prediction of heart failure readmission or death: implications of choosing the right model and the right metrics , 2019, ESC heart failure.

[19]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[22]  Bradley A. Evanoff,et al.  Spot the difference: comparing results of analyses from real patient data and synthetic derivatives , 2020, JAMIA open.

[23]  M. Tadel,et al.  Improving risk prediction in heart failure using machine learning , 2019, European journal of heart failure.

[24]  Broderick Crawford,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2007 .