Performance Tuning of PCA by CFS-Shapley Ensemble and Its Application to Medical Diagnosis

Selection of optimal features is an important area of research in medical data mining systems. Principal component analysis (PCA) is one among the most popular feature selection methods. Still PCA faces a drawback – i.e., the measurements from all of the original features are used in the projection to the lower dimensional space. Hence this work is aimed to tune the performance of PCA and classify the medical profiles. The proposed method is realized as an ensemble procedure with three steps – (i) feature selection using PCA, (ii) feature ranking with CFS and (iii) dimension reduction using Shapley Values Analysis. The variance coverage parameter of PCA is adjusted so as to yield maximum accuracy which are measured with specificity, sensitivity, precision and recall. This facilitates the selection of a compact set of superior features with uncompromised detection rates, remarkably at a low cost. To appraise the success of the proposed method, experiments were conducted across 6 different medical data sets using J48 decision tree classifier, which showed that the proposed procedure improves the classification efficiency and accuracy compared with individual usage.