A Novel Classifier-Independent Feature Selection Algorithm for Imbalanced Datasets

A novel classifier-independent feature selection algorithm based on the posterior probability is proposed for imbalanced datasets. First, an imbalanced factor is introduced and computed by Parzen-window estimation. The middle point of Tomek links is chosen as the initial point. Accordingly, this algorithm is iterated to find out the boundary points which have the equality of posterior probability. Through the project computation on the normal vectors of these points, the weight of each feature can be obtained, which actually indicates the importance degree of each feature. The experimental results on 3 real-word datasets demonstrate that this proposed algorithm can not only reduce the computational cost but also overcome the shortcoming that the majority class may be detected well but the minority class may be ignored in the conventional feature selection algorithm.