Development of Disease Prediction Model Based on Ensemble Learning Approach for Diabetes and Hypertension

Early diseases prediction plays an important role for improving healthcare quality and can help individuals avoid dangerous health situations before it is too late. This paper proposes a disease prediction model (DPM) to provide an early prediction for type 2 diabetes and hypertension based on individual’s risk factors data. The proposed DPM consists of isolation forest (iForest) based outlier detection method to remove outlier data, synthetic minority oversampling technique tomek link (SMOTETomek) to balance data distribution, and ensemble approach to predict the diseases. Four datasets were utilized to build the model and extract the most significant risks factors. The results showed that the proposed DPM achieved highest accuracy when compared to other models and previous studies. We also developed a mobile application to provide the practical application of the proposed DPM. The developed mobile application gathers risk factor data and send it to a remote server, so that an individual’s current condition can be diagnosed with the proposed DPM. The prediction result is then sent back to the mobile application; thus, immediate and appropriate action can be taken to reduce and prevent individual’s risks once unexpected health situations occur (i.e., type 2 diabetes and/or hypertension) at early stages.

[1]  Jianfeng Wang,et al.  Applications, challenges, and prospective in emerging body area networking technologies , 2010, IEEE Wireless Communications.

[2]  M. Lanaspa,et al.  Different Risk for Hypertension, Diabetes, Dyslipidemia, and Hyperuricemia According to Level of Body Mass Index in Japanese and American Subjects , 2018, Nutrients.

[3]  Yuhua Li,et al.  Evaluation of Sampling Methods for Learning from Imbalanced Data , 2013, ICIC.

[4]  Jongtae Rhee,et al.  An Affordable Fast Early Warning System for Edge Computing in Assembly Line , 2018, Applied Sciences.

[5]  A. Kriska,et al.  Role of physical activity in diabetes management and prevention. , 2008, Journal of the American Dietetic Association.

[6]  Sherif Sakr,et al.  Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford ExercIse Testing (FIT) Project , 2018, PloS one.

[7]  Aladeen Alloubani,et al.  Hypertension and diabetes mellitus as a predictive risk factors for stroke. , 2018, Diabetes & metabolic syndrome.

[8]  Francesco Rubino,et al.  Is Type 2 Diabetes an Operable Intestinal Disease? , 2008, Diabetes Care.

[9]  Jongtae Rhee,et al.  A Personalized Healthcare Monitoring System for Diabetic Patients by Utilizing BLE-Based Sensors and Real-Time Data Processing , 2018, Sensors.

[10]  M. A. H. Farquad,et al.  Preprocessing unbalanced data using support vector machine , 2012, Decis. Support Syst..

[11]  Shengqi Yang,et al.  Type 2 diabetes mellitus prediction model based on data mining , 2018 .

[12]  Rajkumar Buyya,et al.  On the effectiveness of isolation‐based anomaly detection in cloud data centers , 2017, Concurr. Comput. Pract. Exp..

[13]  K. Reynolds,et al.  Global burden of hypertension: analysis of worldwide data , 2005, The Lancet.

[14]  Wolfgang Rathmann,et al.  Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. , 2004, Diabetes care.

[15]  Yiik Diew Wong,et al.  Key feature selection and risk prediction for lane-changing behaviors based on vehicles' trajectory data. , 2019, Accident; analysis and prevention.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[18]  R. Welbourn,et al.  Obesity Treatment in the UK Health System , 2016, Current Obesity Reports.

[19]  Jongtae Rhee,et al.  Real-Time Monitoring System Using Smartphone-Based Sensors and NoSQL Database for Perishable Supply Chain , 2017 .

[20]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[21]  Sebastian Raschka,et al.  MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack , 2018, J. Open Source Softw..

[22]  A. Krikorian,et al.  Standards of medical care in diabetes--2006. , 2006, Diabetes care.

[23]  Nongyao Nai-arun,et al.  Comparison of Classifiers for the Risk of Diabetes Prediction , 2015 .

[24]  C. Marroccoa,et al.  Maximizing the area under the ROC curve by pairwise feature combination , 2008 .

[25]  P. Zimmet,et al.  Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation , 1998, Diabetic medicine : a journal of the British Diabetic Association.

[26]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[27]  Kaoru Uchida,et al.  Data- and Algorithm-Hybrid Approach for Imbalanced Data Problems in Deep Neural Network , 2018, International Journal of Machine Learning and Computing.

[28]  Seongkyu Yoon,et al.  Decision support in machine vision system for monitoring of TFT-LCD glass substrates manufacturing , 2014 .

[29]  J. Schorling,et al.  Prevalence of Coronary Heart Disease Risk Factors Among Rural Blacks: A Community-Based Study , 1997, Southern medical journal.

[30]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[31]  Jignesh R. Parikh,et al.  Reverse Engineering and Evaluation of Prediction Models for Progression to Type 2 Diabetes , 2016, Journal of diabetes science and technology.

[32]  Carles Gomez,et al.  Overview and Evaluation of Bluetooth Low Energy: An Emerging Low-Power Wireless Technology , 2012, Sensors.

[33]  Y. Jang,et al.  Standards of Medical Care in Diabetes-2010 by the American Diabetes Association: Prevention and Management of Cardiovascular Disease , 2010 .

[34]  Hudson Fernandes Golino,et al.  Predicting Increased Blood Pressure Using Machine Learning , 2014, Journal of obesity.

[35]  Jongtae Rhee,et al.  An Open Source-Based Real-Time Data Processing Architecture Framework for Manufacturing Sustainability , 2017 .

[36]  Antonio J. Tallón-Ballesteros,et al.  Deleting or keeping outliers for classifier training? , 2014, 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014).

[37]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[38]  Youngshin Han,et al.  Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process , 2016, ITCS 2016.

[39]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[40]  R. Umadevi,et al.  Burden of diabetes and hypertension among people attending health camps in an urban area of Kancheepuram district , 2017 .

[41]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[42]  Kar-Ann Toh,et al.  Maximizing area under ROC curve for biometric scores fusion , 2008, Pattern Recognit..

[43]  Maryam Tayefi,et al.  The application of a decision tree to establish the parameters associated with hypertension , 2017, Comput. Methods Programs Biomed..

[44]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[45]  Usman Qamar,et al.  IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework , 2016, J. Biomed. Informatics.

[46]  Ajith Abraham,et al.  Improving kNN Text Categorization by Removing Outliers from Training Set , 2006, CICLing.

[47]  Joseph M Pappachan,et al.  Diabetes mellitus and stroke: A clinical update , 2017, World journal of diabetes.

[48]  Jongtae Rhee,et al.  False Positive RFID Detection Using Classification Models , 2019, Applied Sciences.

[49]  Jongtae Rhee,et al.  Performance Analysis of IoT-Based Sensor, Big Data Processing, and Machine Learning Model for Real-Time Monitoring System in Automotive Manufacturing , 2018, Sensors.

[50]  Jongtae Rhee,et al.  Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCAN-Based Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), and Random Forest , 2018, Applied Sciences.

[51]  K. Ryu,et al.  Prediction of Prehypertenison and Hypertension Based on Anthropometry, Blood Parameters, and Spirometry , 2018, International journal of environmental research and public health.

[52]  F. Hu,et al.  Prevention and management of type 2 diabetes: dietary components and nutritional strategies , 2014, The Lancet.

[53]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[54]  Arif Gülten,et al.  Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms , 2011, Comput. Methods Programs Biomed..

[55]  Jongtae Rhee,et al.  Customer behavior analysis using real-time data processing , 2019, Asia Pacific Journal of Marketing and Logistics.

[56]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[57]  Pradeep Singh,et al.  A rule extraction approach from support vector machines for diagnosing hypertension among diabetics , 2019, Expert Syst. Appl..

[58]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[59]  Golino Hudson Men's dataset from the "Predicting increased blood pressure using Machine Learning" paper , 2013 .

[60]  Maurizio Filippone,et al.  A comparative evaluation of outlier detection algorithms: Experiments and analyses , 2018, Pattern Recognit..

[61]  Georgios Paliouras,et al.  Stacking Classifiers for Anti-Spam Filtering of E-Mail , 2001, EMNLP.

[62]  Jimeng Sun,et al.  Predicting changes in hypertension control using electronic health records from a chronic disease management program , 2014, J. Am. Medical Informatics Assoc..

[63]  D. Rastenytė,et al.  [Physical and mental health of stroke survivors and their daily activities]. , 2009, Medicina.

[64]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.