Fasting Blood Glucose Change Prediction Model Based on Medical Examination Data and Data Mining Techniques

Fasting blood glucose (FBG) is an important indicator for human's health. Prediction for FBG is meaningful for finding and healing diseases, especially for diabetes mellitus. Based on four years' historical medical examination data, a prediction model of coming year's FBG is presented using traditional data mining techniques with a novel algorithm to estimate the FBG change probability and a proposed feature selection algorithm, which combines the feature importance scores of ensemble learning and Sequential Backward Selection (SBS) algorithm to select an optimal feature subset. Experimental data are collected from a medical examination database containing 108,386 users, in which 7,136 people have four years' records. Compared with traditional support vector machine (SVM) and random forest, experimental results demonstrate that the feature selection algorithm can improve the performance of both SVM and random forest. Also the proposed method to estimate the probability of the FBG change works promisingly for giving an intuitive description of predictive result.

[1]  Jason Roy,et al.  Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches , 2010, Medical care.

[2]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[3]  B. Murugeshwari,et al.  Extending Association Rule Summarization Techniques to Assess Risk of Diabetes Mellitus , 2016 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Cynthia R. Marling,et al.  Blood Glucose Level Prediction Using Physiological Models and Support Vector Regression , 2013, 2013 12th International Conference on Machine Learning and Applications.

[10]  S. Fowler,et al.  Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. , 2002 .

[11]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[12]  Jin Park,et al.  A sequential neural network model for diabetes prediction , 2001, Artif. Intell. Medicine.

[13]  Aswathy Ravikumar,et al.  Study of Data Mining Algorithms for Prediction and Diagnosis of Diabetes Mellitus , 2014 .

[14]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[15]  Sellappan Palaniappan,et al.  Intelligent heart disease prediction system using data mining techniques , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[16]  Pedro J. Caraballo,et al.  Extending Association Rule Summarization Techniques to Assess Risk of Diabetes Mellitus , 2015, IEEE Transactions on Knowledge and Data Engineering.

[17]  M. Cevdet Ince,et al.  An expert system for detection of breast cancer based on association rules and neural network , 2009, Expert Syst. Appl..

[18]  Merrick I Ross,et al.  Positive surgical margins and ipsilateral breast tumor recurrence predict disease‐specific survival after breast‐conserving therapy , 2003, Cancer.

[19]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .