Predicting Stroke Risk With an Interpretable Classifier

Predicting an individual’s risk of getting a stroke has been a research subject for many authors worldwide since it is a frequent illness and there is strong evidence that early awareness of having that risk can be beneficial for prevention and treatment. Many Governments have been collecting medical data about their own population with the purpose of using artificial intelligence methods for making those predictions. The most accurate ones are based on so called black-box methods which give little or no information about why they make a certain prediction. However, in the medical field the explanations are sometimes more important than the accuracy since they allow specialists to gain insight about the factors that influence the risk level. It is also frequent to find medical information records with some missing data. In this work, we present the development of a prediction method which not only outperforms some other existing ones but it also gives information about the most probable causes of a high stroke risk and can deal with incomplete data records. It is based on the Dempster-Shafer theory of plausibility. For the testing we used data provided by the regional hospital in Okayama, Japan, a country in which people are compelled to undergo annual health checkups by law. This article presents experiments comparing the results of the Dempster-Shafer method with the ones obtained using other well-known machine learning methods like Multilayer perceptron, Support Vector Machines and Naive Bayes. Our approach performed the best in these experiments with some missing data. It also presents an analysis of the interpretation of rules produced by the method for doing the classification. The rules were validated by both medical literature and human specialists.

[1]  Francesca N. Delling,et al.  Heart Disease and Stroke Statistics—2019 Update: A Report From the American Heart Association , 2019, Circulation.

[2]  M. Visser,et al.  Waist Circumference and Sagittal Diameter Reflect Total Body Fat Better Than Visceral Fat in Older Men and Women: The Health, Aging and Body Composition Study , 2000, Annals of the New York Academy of Sciences.

[3]  E. Rimm,et al.  Body size and fat distribution as predictors of stroke among US men. , 1996, American journal of epidemiology.

[4]  P A Wolf,et al.  Survival and recurrence following stroke. The Framingham study. , 1982, Stroke.

[5]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[6]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[7]  M. Rich,et al.  Validation of clinical classification schemes for predicting stroke: results from the National Registry of Atrial Fibrillation. , 2001, JAMA.

[8]  Kamran Sartipi,et al.  HL7 FHIR: An Agile and RESTful approach to healthcare information exchange , 2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

[9]  T. Palmerini,et al.  Plasma fibrinogen and platelet count in stroke. , 1993, Journal of medicine.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Andreas Zell,et al.  Use of support vector machines for disease risk prediction in genome‐wide association studies: Concerns and opportunities , 2012, Human mutation.

[13]  Glenn Shafer,et al.  Dempster's rule of combination , 2016, Int. J. Approx. Reason..

[14]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[15]  Ivan Bratko,et al.  Machine Learning: Between Accuracy and Interpretability , 1997 .

[16]  Nelson Baloian,et al.  Associating risks of getting strokes with data from health checkup records using Dempster-Shafer Theory , 2018, 2018 20th International Conference on Advanced Communication Technology (ICACT).

[17]  Douglas Teoh,et al.  Towards stroke prediction using electronic health records , 2018, BMC Medical Informatics and Decision Making.

[18]  José A. Pino,et al.  Applying Dempster-Shafer theory for developing a flexible, accurate and interpretable classifier , 2020, Expert Syst. Appl..

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  Y. Kokubo,et al.  Association of high-density lipoprotein cholesterol concentration with different types of stroke and coronary heart disease: The Japan Public Health Center-based prospective (JPHC) study. , 2017, Atherosclerosis.

[21]  M. Owolabi,et al.  Stroke: a global response is needed , 2016, Bulletin of the World Health Organization.

[22]  F. Giubilei,et al.  Long-term prognosis after a minor stroke: 10-year mortality and major stroke recurrence rates in a hospital-based cohort. , 1998, Stroke.

[23]  J. Tuomilehto,et al.  Body mass index, waist circumference, and waist-hip ratio on the risk of total and type-specific stroke. , 2007, Archives of internal medicine.

[24]  Daniel B Hier,et al.  Stroke recurrence within 2 years after ischemic infarction. , 1991, Stroke.

[25]  Kyung-hee Cho,et al.  The development and implementation of stroke risk prediction model in National Health Insurance Service's personal health record , 2018, Comput. Methods Programs Biomed..

[26]  D. Reed,et al.  Diabetes and the risk of stroke. The Honolulu Heart Program. , 1987, JAMA.

[27]  A. Folsom,et al.  Prospective associations of fasting insulin, body fat distribution, and diabetes with risk of ischemic stroke. The Atherosclerosis Risk in Communities (ARIC) Study Investigators. , 1999, Diabetes care.

[28]  Y. Aizawa,et al.  The degree of workers' use of annual health checkup results among Japanese workers. , 2008, Industrial health.

[29]  C. Nolte,et al.  Glycosylated Hemoglobin A1 Predicts Risk for Symptomatic Hemorrhage After Thrombolysis for Acute Stroke , 2013, Stroke.

[30]  Ann Fritch Stroke Recurrence; Predictors, Severity, and Prognosis: The Copenhagen Stroke Study , 1998 .

[31]  A. Gotto,et al.  Assessing low levels of high-density lipoprotein cholesterol as a risk factor in coronary heart disease: a working group report and update. , 2004, Journal of the American College of Cardiology.

[32]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[33]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[34]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[35]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.