Early detection of type II Diabetes Mellitus with random forest and classification and regression tree (CART)

Diabetes Mellitus (DM) is the third deadliest disease in Indonesia, and type II DM is more dangerous because it is caused by the combination between genetic and lifestyle factors. The high rate of patients infected with type II DM is caused by late diagnosis, therefore, early detection of disease is necessary to classify the detected patients with type II diabetes mellitus, and undetected patients. Moreover, analyzing the determinant and major attributes are highly recommended. In this research is implemented the combined Classification methods between Regression Tree method (CART) and Random Forest (RF) to build the classification model that is used in the early detection of diabetes mellitus type II disease. Those methods are selected based on the characteristics of the dataset used in medical records that consist of complex attributes consisting of several categorical attributes and continuous attributes, besides the advantages of the CART models are easy to implement, and it can explore the structure of complex medical records, while the RF method can handle the problem in accuracy. This research has tested a different number of trees and numbers of candidate attributes splitter. Based on the test results, it shows that the addition of trees and attributes splitter can improve the accuracy and reduce the error rate, with the optimal inputs are 50 numbers of trees and 3 number of attributes splitter with 83,8% average accuracy. The important attribute of early detection of diabetes mellitus type II is heredity, age, and body mass index.