Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCAN-Based Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), and Random Forest

As the risk of diseases diabetes and hypertension increases, machine learning algorithms are being utilized to improve early stage diagnosis. This study proposes a Hybrid Prediction Model (HPM), which can provide early prediction of type 2 diabetes (T2D) and hypertension based on input risk-factors from individuals. The proposed HPM consists of Density-based Spatial Clustering of Applications with Noise (DBSCAN)-based outlier detection to remove the outlier data, Synthetic Minority Over-Sampling Technique (SMOTE) to balance the distribution of class, and Random Forest (RF) to classify the diseases. Three benchmark datasets were utilized to predict the risk of diabetes and hypertension at the initial stage. The result showed that by integrating DBSCAN-based outlier detection, SMOTE, and RF, diabetes and hypertension could be successfully predicted. The proposed HPM provided the best performance result as compared to other models for predicting diabetes as well as hypertension. Furthermore, our study has demonstrated that the proposed HPM can be applied in real cases in the IoT-based Health-care Monitoring System, so that the input risk-factors from end-user android application can be stored and analyzed in a secure remote server. The prediction result from the proposed HPM can be accessed by users through an Android application; thus, it is expected to provide an effective way to find the risk of diabetes and hypertension at the initial stage.

[1]  Aladeen Alloubani,et al.  Hypertension and diabetes mellitus as a predictive risk factors for stroke. , 2018, Diabetes & metabolic syndrome.

[2]  Neveen I. Ghali,et al.  Improving social network community detection using DBSCAN algorithm , 2014, 2014 World Symposium on Computer Applications & Research (WSCAR).

[3]  KeeHyun Park,et al.  An IoT System for Remote Monitoring of Patients at Home , 2017 .

[4]  T. Thom,et al.  American Heart Association Statistics Committee and Stroke Statistics Subcommittee : Heart disease and stroke statistical-2006 update : A report from the American Heart Association Statistics Committee and Stroke statistics subcommittee , 2006 .

[5]  Ching-Wei Chang,et al.  Assessing Sex Differences in the Risk of Cardiovascular Disease and Mortality per Increment in Systolic Blood Pressure: A Systematic Review and Meta-Analysis of Follow-Up Studies in the United States , 2017, PloS one.

[6]  Ajith Abraham,et al.  Improving kNN Text Categorization by Removing Outliers from Training Set , 2006, CICLing.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Francesco Rubino,et al.  Is Type 2 Diabetes an Operable Intestinal Disease? , 2008, Diabetes Care.

[9]  Vili Podgorelec,et al.  Improving mining of medical data by outliers prediction , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[10]  Shengqi Yang,et al.  Type 2 diabetes mellitus prediction model based on data mining , 2018 .

[11]  Francisco Herrera,et al.  Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection , 2014, Appl. Soft Comput..

[12]  Wlodek Kulesza,et al.  IoT-based information system for healthcare application : Design methodology approach , 2017 .

[13]  Jeffrey E. Thatcher,et al.  Outlier detection and removal improves accuracy of machine learning approach to multispectral burn diagnostic imaging , 2015, Journal of biomedical optics.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Janet S. Wright,et al.  CDC Grand Rounds: A Public Health Approach to Detect and Control Hypertension. , 2016, MMWR. Morbidity and mortality weekly report.

[16]  Xi Li,et al.  Recognition of New and Old Banknotes Based on SMOTE and SVM , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[17]  S. Wild,et al.  Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. , 2004, Diabetes care.

[18]  Fernando Bação,et al.  Oversampling for Imbalanced Learning Based on K-Means and SMOTE , 2017, Inf. Sci..

[19]  J. Schorling,et al.  Prevalence of Coronary Heart Disease Risk Factors Among Rural Blacks: A Community-Based Study , 1997, Southern medical journal.

[20]  Carles Gomez,et al.  Overview and Evaluation of Bluetooth Low Energy: An Emerging Low-Power Wireless Technology , 2012, Sensors.

[21]  Durga Toshniwal,et al.  Hybrid prediction model for Type-2 diabetic patients , 2010, Expert Syst. Appl..

[22]  Mark D. Huffman,et al.  Heart disease and stroke statistics--2013 update: a report from the American Heart Association. , 2013, Circulation.

[23]  Ignacio Rodríguez-Rodríguez,et al.  Towards an ICT-Based Platform for Type 1 Diabetes Mellitus Management , 2018 .

[24]  Joseph M Pappachan,et al.  Diabetes mellitus and stroke: A clinical update , 2017, World journal of diabetes.

[25]  Antonio J. Tallón-Ballesteros,et al.  Deleting or keeping outliers for classifier training? , 2014, 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014).

[26]  F. Hu,et al.  Prevention and management of type 2 diabetes: dietary components and nutritional strategies , 2014, The Lancet.

[27]  Mei Han,et al.  An outliers detection method of time series data for soft sensor modeling , 2016, 2016 Chinese Control and Decision Conference (CCDC).

[28]  Giovanni Sparacino,et al.  Calibration of Minimally Invasive Continuous Glucose Monitoring Sensors: State-of-The-Art and Current Perspectives , 2018, Biosensors.

[29]  M. Carroll,et al.  Hypertension among adults in the United States, 2009-2010. , 2012, NCHS data brief.

[30]  Abdennaceur Kachouri,et al.  Outlier detection for wireless sensor networks using density-based clustering approach , 2017, IET Wirel. Sens. Syst..

[31]  Mark D. Huffman,et al.  Executive summary: heart disease and stroke statistics--2013 update: a report from the American Heart Association. , 2013, Circulation.

[32]  Thangavel Alphonse Thanaraj,et al.  Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study , 2013, BMJ Open.

[33]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[34]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[35]  Maryam Tayefi,et al.  The application of a decision tree to establish the parameters associated with hypertension , 2017, Comput. Methods Programs Biomed..

[36]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[37]  Hudson Fernandes Golino,et al.  Predicting Increased Blood Pressure Using Machine Learning , 2014, Journal of obesity.

[38]  Hong Song,et al.  A new method for noise data detection based on DBSCAN and SVDD , 2015, 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER).

[39]  Nongyao Nai-arun,et al.  Comparison of Classifiers for the Risk of Diabetes Prediction , 2015 .

[40]  Daniel W. Jones,et al.  Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. , 2003, Hypertension.

[41]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[42]  R. Collins,et al.  Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies , 2002, The Lancet.

[43]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[44]  Jongtae Rhee,et al.  Real-Time Monitoring System Using Smartphone-Based Sensors and NoSQL Database for Perishable Supply Chain , 2017 .

[45]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[46]  M. Carroll,et al.  Hypertension Prevalence and Control Among Adults: United States, 2011-2014. , 2015, NCHS data brief.

[47]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[48]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2012 update: a report from the American Heart Association. , 2012, Circulation.

[49]  BingHao Yan,et al.  A novel region adaptive SMOTE algorithm for intrusion detection on imbalanced problem , 2017, 2017 3rd IEEE International Conference on Computer and Communications (ICCC).

[50]  Kun-Huang Chen,et al.  A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients , 2014, Appl. Soft Comput..

[51]  Jianfeng Wang,et al.  Applications, challenges, and prospective in emerging body area networking technologies , 2010, IEEE Wireless Communications.

[52]  Manal Alghamdi,et al.  Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project , 2017, PloS one.

[53]  Sung Wook Baik,et al.  Oversampling Techniques for Bankruptcy Prediction: Novel Features from a Transaction Dataset , 2018, Symmetry.

[54]  A. Kriska,et al.  Role of physical activity in diabetes management and prevention. , 2008, Journal of the American Dietetic Association.

[55]  Sherif Sakr,et al.  Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford ExercIse Testing (FIT) Project , 2018, PloS one.

[56]  Kira Radinsky,et al.  Machine learning of big data in gaining insight into successful treatment of hypertension , 2018, Pharmacology research & perspectives.

[57]  Sheng Chen,et al.  A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems , 2011, Neurocomputing.

[58]  Konrad Jamrozik Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies , 2002 .

[59]  Jimeng Sun,et al.  Predicting changes in hypertension control using electronic health records from a chronic disease management program , 2014, J. Am. Medical Informatics Assoc..