Integrated Machine Learning Approaches for Predicting Ischemic Stroke and Thromboembolism in Atrial Fibrillation

Atrial fibrillation (AF) is a common cardiac rhythm disorder, which increases the risk of ischemic stroke and other thromboembolism (TE). Accurate prediction of TE is highly valuable for early intervention to AF patients. However, the prediction performance of previous TE risk models for AF is not satisfactory. In this study, we used integrated machine learning and data mining approaches to build 2-year TE prediction models for AF from Chinese Atrial Fibrillation Registry data. We first performed data cleansing and imputation on the raw data to generate available dataset. Then a series of feature construction and selection methods were used to identify predictive risk factors, based on which supervised learning methods were applied to build the prediction models. The experimental results show that our approach can achieve higher prediction performance (AUC: 0.71~0.74) than previous TE prediction models for AF (AUC: 0.66~0.69), and identify new potential risk factors as well.

[1]  R B D'Agostino,et al.  Stroke severity in atrial fibrillation. The Framingham Study. , 1996, Stroke.

[2]  Hugh Calkins,et al.  2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society. , 2014, Journal of the American College of Cardiology.

[3]  Fei Wang,et al.  Combining Knowledge and Data Driven Insights for Identifying Risk Factors using Electronic Health Records , 2012, AMIA.

[4]  Gregory Y H Lip,et al.  Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. , 2010, Chest.

[5]  Jianying Hu,et al.  A Graph Based Methodology for Temporal Signature Identification from EHR , 2015, AMIA.

[6]  Daniel Levy,et al.  Arrhythmias: abstractA risk score for predicting stroke or death in individuals with new-onset atrial fibrillation in the community. The Framingham Heart Study☆ , 2003 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Gerhard Hindricks,et al.  2012 Focused Update of the ESC Guidelines for the Management of Atrial Fibrillation , 2013 .

[9]  Martha J. Radford,et al.  Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation , 2001 .

[10]  C. Granger,et al.  Oral anticoagulants for stroke prevention in atrial fibrillation: current status, special situations, and unmet needs , 2015, The Lancet.

[11]  Omolola Ogunyemi,et al.  Machine Learning Approaches for Detecting Diabetic Retinopathy from Clinical and Public Health Records , 2015, AMIA.

[12]  D.,et al.  Regression Models and Life-Tables , 2022 .

[13]  D. Hu,et al.  An Epidemiological Study on the Prevalence of Atrial Fibrillation in the Chinese Population of Mainland China , 2008, Journal of epidemiology.

[14]  Jianying Hu,et al.  A Graph Based Methodology for Temporal Signature Identification from HER. , 2015, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[15]  D. Lane,et al.  A comparison of risk stratification schemes for stroke in 79 884 atrial fibrillation patients in general practice , 2011, Journal of thrombosis and haemostasis : JTH.

[16]  H. R. Warner,et al.  The HELP system , 1982, Journal of Medical Systems.

[17]  Jeroen J. Bax,et al.  2012 focused update of the ESC Guidelines for the management of atrial fibrillation: an update of the 2010 ESC Guidelines for the management of atrial fibrillation. Developed with the special contribution of the European Heart Rhythm Association. , 2012, European heart journal.

[18]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[20]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[21]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[22]  Yu Cao,et al.  An integrated machine learning approach to stroke prediction , 2010, KDD.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Shahram Ebadollahi,et al.  Toward personalized care management of patients at risk: the diabetes case study , 2011, KDD.