A Comparative Analysis on Feature Selection Techniques for Medical Datasets

Feature selection has become the vital step in many data mining application for instances classification. Feature selection increases the accuracy of the classifier because it eliminates irrelevant attributes. High quality features may contribute in enhancing classification process and produce better results. This study is conducted with the intention to find out the most appropriate features that may lead to the best accuracy for various datasets of same domain, which is medical domain. In this study, we made a comparison between benchmark feature selection methods based on eight  medical datasets and two well-recognized machine learning algorithms. It analyzes the performance of machine learning algorithms; Naive Bayes and KNN with and without feature selection in term of F-Measure and ROC on various medical datasets. With this experiments, it found out that selection methods are capable to improve the performance of learning algorithms.  However, some performance values hold the same total number of attributes with different feature subsets and no single feature selection methods that best satisfy all datasets and learning algorithms. To strengthen the output, we run this similar total number of attributes but with different feature subsets with Naive Bayes classifier in term of accuracy to obtained optimal results. With this it enable us to obtain the optimal dimensionality of the feature subsets.