K-Nearest Neighbor Learning based Diabetes Mellitus Prediction and Analysis for eHealth Services

Nowadays, eHealth service has become a booming area, which refers to computer-based health care and information delivery to improve health service locally, regionally and worldwide. An effective disease risk prediction model by analyzing electronic health data benefits not only to care a patient but also to provide services through the corresponding data-driven eHealth systems. In this paper, we particularly focus on predicting and analysing diabetes mellitus, an increasingly prevalent chronic disease that refers to a group of metabolic disorders characterized by a high blood sugar level over a prolonged period of time. K-Nearest Neighbor (KNN) is one of the most popular and simplest machine learning techniques to build such a disease risk prediction model utilizing relevant health data. In order to achieve our goal, we present an optimal KNearest Neighbor (Opt-KNN) learning based prediction model based on patient’s habitual attributes in various dimensions. This approach determines the optimal number of neighbors with low error rate for providing better prediction outcome in the resultant model. The effectiveness of this machine learning eHealth model is examined by conducting experiments on the real-world diabetes mellitus data collected from medical hospitals. Received on 01 September 2019, accepted on 05 January 2020, published on 15 January 2020

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Iqbal H. Sarker,et al.  BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model , 2019, Mobile Networks and Applications.

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  Bernard C. Jiang,et al.  Application of classification techniques on development an early-warning system for chronic illnesses , 2012, Expert Syst. Appl..

[5]  Iqbal H. Sarker,et al.  RecencyMiner: mining recency-based personalized behavior from contextual smartphone data , 2019, Journal of Big Data.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Chris Gennings,et al.  Linking empirical estimates of body burden of environmental chemicals and wellness using NHANES data. , 2012, Environment international.

[8]  Carlos Ordonez Comparing association rules and decision trees for disease prediction , 2006, HIKM '06.

[9]  Iqbal H. Sarker,et al.  Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus , 2019, 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE).

[10]  Ching-Hsue Cheng,et al.  A predictive model for cerebrovascular disease using data mining , 2011, Expert Syst. Appl..

[11]  Tai-Hsi Wu,et al.  Using data mining techniques to predict hospitalization of hemodialysis patients , 2011, Decis. Support Syst..

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Iqbal H. Sarker Mobile Data Science: Towards Understanding Data-Driven Intelligent Mobile Applications , 2018, EAI Endorsed Trans. Scalable Inf. Syst..

[14]  Pritika Bahad,et al.  Study of AdaBoost and Gradient Boosting Algorithms for Predictive Analytics , 2019 .

[15]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[16]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[17]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[18]  Taysir Hassan A. Soliman,et al.  A gene selection approach for classifying diseases based on microarray datasets , 2010, 2010 2nd International Conference on Computer Technology and Development.

[19]  Iqbal H. Sarker,et al.  Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage , 2019, Journal of Big Data.

[20]  Peter Groves,et al.  The 'big data' revolution in healthcare: Accelerating value and innovation , 2016 .

[21]  Karim Keshavjee,et al.  Evaluating the performance of the Framingham Diabetes Risk Scoring Model in Canadian electronic medical records. , 2015, Canadian journal of diabetes.

[22]  Munam Ali Shah,et al.  Prediction of Diabetes Using Machine Learning Algorithms in Healthcare , 2018, 2018 24th International Conference on Automation and Computing (ICAC).

[23]  Iqbal H. Sarker Context-aware rule learning from smartphone data: survey, challenges and future directions , 2019, Journal of Big Data.

[24]  Mu-Chen Chen,et al.  Prediction model building and feature selection with support vector machines in breast cancer diagnosis , 2008, Expert Syst. Appl..

[25]  Rob Stocker,et al.  Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients , 2012 .

[26]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[27]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[28]  Iqbal H. Sarker A Machine Learning based Robust Prediction Model for Real-life Mobile Phone Data , 2019, Internet Things.

[29]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[30]  Iqbal H. Sarker,et al.  Individualized Time-Series Segmentation for Mining Mobile Phone User Behavior , 2018, Comput. J..

[31]  N. Sneha,et al.  Analysis of diabetes mellitus for early prediction using optimal features selection , 2019, Journal of Big Data.

[32]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[33]  C. Giorda,et al.  The impact of diabetes mellitus on healthcare costs , 2011 .