Risk prediction of type II diabetes based on random forest model

In recent years, type II diabetes has become a serious disease that threaten the health and mind of human. Efficient predictive modeling is required for medical researchers and practitioners. This study proposes a type II diabetes prediction model based on random forest which aims at analyzing some readily available indicators (age, weight, waist, hip, etc.) effects on diabetes and discovering some rules on given data. The method can significantly reduce the risk of disease through digging out a clear and understandable model for type II diabetes from a medical database. Random forest algorithm uses multiple decision trees to train the samples, and integrates weight of each tree to get the final results. The validation results at school of medicine, University of Virginia shows that the random forest algorithm can greatly reduce the problem of over-fitting of the single decision tree, and it can effectively predict the impact of these readily available indicators on the risk of diabetes. Additionally, we get a better prediction accuracy using random forest than using the naive Bayes algorithm, ID3 algorithm and AdaBoost algorithm.

[1]  Mi Kyung Kim,et al.  Comorbidity Study on Type 2 Diabetes Mellitus Using Data Mining , 2012, The Korean journal of internal medicine.

[2]  Durga Toshniwal,et al.  Hybrid prediction model for Type-2 diabetic patients , 2010, Expert Syst. Appl..

[3]  T. Guterbock,et al.  A trial of church-based smoking cessation interventions for rural African Americans. , 1997, Preventive medicine.

[4]  Mohammad Khubeb Siddiqui,et al.  Application of data mining: Diabetes health care in young and old patients , 2013, J. King Saud Univ. Comput. Inf. Sci..

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Tolga Tasdizen,et al.  Disjunctive normal random forests , 2015, Pattern Recognit..

[7]  Jan C. Bioch,et al.  Classification using Bayesian neural nets , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[8]  Mohamed Medhat Gaber,et al.  Random forests: from early developments to recent advancements , 2014 .

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Amir-Masoud Eftekhari-Moghadam,et al.  Knowledge discovery in medicine: Current issue and future trend , 2014, Expert Syst. Appl..

[11]  Novruz Allahverdi,et al.  Design of a hybrid system for the diabetes and heart diseases , 2008, Expert Syst. Appl..

[12]  Bernard C. Jiang,et al.  Application of classification techniques on development an early-warning system for chronic illnesses , 2012, Expert Syst. Appl..

[13]  Karol Grudzinski,et al.  Towards Heterogeneous Similarity Function Learning for the k-Nearest Neighbors Classification , 2006, ICAISC.

[14]  Mohsen Beheshti,et al.  Diabetes Data Analysis and Prediction Model Discovery Using RapidMiner , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[15]  Q. Qiao,et al.  Is the association of type II diabetes with waist circumference or waist-to-hip ratio stronger than that with body mass index? , 2010, European Journal of Clinical Nutrition.

[16]  X. Qi,et al.  Waist‐to‐height ratio is the best indicator for undiagnosed Type 2 diabetes , 2013, Diabetic medicine : a journal of the British Diabetic Association.