Fused Features Classification for the Effective Prediction of Chronic Kidney Disease

The paper presents an application of data mining for improving the accuracy of prediction of a disease state by selecting the most relevant features associated with it. The experiments are performed on chronic kidney disease (CKD) data. The basic idea in this study is that use of a number of methods instead of a single one increases the probability of selecting features which are more closely related to the disease. Multiple feature selection methods have been applied independently on the CKD data set and the results integrated into a final optimal set of features. These data have been applied to the classifiers to identify CKD from reference cases. Various classification methods are compared to select the best model over 10-fold cross-validation in the training data set. Random Forest classifier is chosen as the best model with superior performance.