Associating risks of getting strokes with data from health checkup records using Dempster-Shafer Theory

Prediction of future diseases from historical data of medical patients is a topic that has gained increasing interest given the growing availability of such data in electronic format. Most of the developed systems are based on machine learning techniques, which are good to find relations between data but do not help explaining causalities. In particular, it would be difficult to get a meaningful medical explanation for the relationship between a patient's health checkup data and the risk of developing a certain disease. On the other hand, expert system approaches, like Bayesian networks, are based on medical knowledge but have trouble dealing with high levels of uncertainty, which is crucial in this kind of scenario. In this work we present a prediction system for the risk of a patient having a (heart or brain) stroke based on past medical checkup data. The system is based on the Dempster-Shafer Theory of plausibility which is good for handling uncertainty. The data used belongs to a rural hospital in Okayama, Japan, where people are compelled to undergo annual health checkups by law. The model also produces rules that are able to relate data from exam results with the aforementioned risk, thus proposing a cause from the medical point of view. Experiments comparing the results of the Dempster-Shafer method with other machine learning methods like Multilayer perceptron, Quadratic discriminant analysis and Naive Bayes show that our approach performed the best in general, with an overall prediction accuracy of 61% and with the best precision value on true positive cases of stroke.

[1]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[2]  E. Faught,et al.  Cerebral complications of angiography for transient ischemia and stroke , 1979, Neurology.

[3]  D E Heckerman,et al.  Toward Normative Expert Systems: Part II Probability-Based Representations for Efficient Knowledge Acquisition and Inference , 1992, Methods of Information in Medicine.

[4]  D. Heckerman,et al.  Toward Normative Expert Systems: Part I The Pathfinder Project , 1992, Methods of Information in Medicine.

[5]  Ronald P. S. Mahler,et al.  The modified Dempster-Shafer approach to classification , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[6]  Ivan Bratko,et al.  Machine Learning: Between Accuracy and Interpretability , 1997 .

[7]  Patrice Degoulet,et al.  Models to predict cardiovascular risk: comparison of CART, multilayer perceptron and logistic regression , 2000, AMIA.

[8]  Thierry Denoeux,et al.  A neural network classifier based on Dempster-Shafer theory , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[9]  Carlos Ordonez Comparing association rules and decision trees for disease prediction , 2006, HIKM '06.

[10]  Thierry Denoeux A k -Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory , 2008, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[11]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[12]  Yu Cao,et al.  An integrated machine learning approach to stroke prediction , 2010, KDD.

[13]  D. Siscovick,et al.  Glycosylated hemoglobin and the risk of death and cardiovascular mortality in the elderly. , 2010, Nutrition, metabolism, and cardiovascular diseases : NMCD.

[14]  Melanie J. Cowan,et al.  Noncommunicable diseases country profiles 2011. , 2011 .

[15]  Andreas Zell,et al.  Use of support vector machines for disease risk prediction in genome‐wide association studies: Concerns and opportunities , 2012, Human mutation.

[16]  Sidong Liu,et al.  Early diagnosis of Alzheimer's disease with deep learning , 2014, 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI).

[17]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[18]  Glenn Shafer,et al.  Dempster's rule of combination , 2016, Int. J. Approx. Reason..

[19]  Y. Kokubo,et al.  Association of high-density lipoprotein cholesterol concentration with different types of stroke and coronary heart disease: The Japan Public Health Center-based prospective (JPHC) study. , 2017, Atherosclerosis.

[20]  Nami Kawate,et al.  Body mass index and stroke incidence in Japanese community residents: The Jichi Medical School (JMS) Cohort Study , 2017, Journal of epidemiology.

[21]  M. Fornage,et al.  Heart Disease and Stroke Statistics—2017 Update: A Report From the American Heart Association , 2017, Circulation.

[22]  K Kasikumar,et al.  Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks , 2018, International Journal of Data Mining Techniques and Applications.