Discovering Decision Tree Based Diabetes Prediction Model

Data mining techniques have been extensively applied in bioinformatics to analyze biomedical data. In this paper, we choose the Rapid-I’s RapidMiner as our tool to discover decision tree based diabetes prediction model from a Pima Indians Diabetes Data Set, which collects the information of patients with and without developing diabetes. Following the data mining process, our discussion will focus on the data preprocessing, including attribute identification and selection, outlier removal, data normalization and numerical discretization, visual data analysis, hidden relationships discovery, and a diabetes prediction model construction.