Invariant optimal feature selection: A distance discriminant and feature ranking based solution

The goal of feature selection is to find the optimal subset consisting of m features chosen from the total n features. One critical problem for many feature selection methods is that an exhaustive search strategy has to be applied to seek the best subset among all the possible nm feature subsets, which usually results in a considerably high computational complexity. The alternative suboptimal feature selection methods provide more practical solutions in terms of computational complexity but they cannot promise that the finally selected feature subset is globally optimal. We propose a new feature selection algorithm based on a distance discriminant (FSDD), which not only solves the problem of the high computational costs but also overcomes the drawbacks of the suboptimal methods. The proposed method is able to find the optimal feature subset without exhaustive search or Branch and Bound algorithm. The most difficult problem for optimal feature selection, the search problem, is converted into a feature ranking problem following rigorous theoretical proof such that the computational complexity can be greatly reduced. The proposed method is invariant to the linear transformation of data when a diagonal transformation matrix is applied. FSDD was compared with ReliefF and mrmrMID based on mutual information on 8 data sets. The experiment results show that FSDD outperforms the other two methods and is highly efficient.

[1]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[3]  Abraham Kandel,et al.  Introduction to Pattern Recognition: Statistical, Structural, Neural and Fuzzy Logic Approaches , 1999 .

[4]  Hiok Chai Quek,et al.  MCES: A Novel Monte Carlo Evaluative Selection Approach for Objective Feature Selections , 2007, IEEE Transactions on Neural Networks.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Lior Wolf,et al.  Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[9]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[11]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[12]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  O. de Vel,et al.  New Fast Algorithms for Error Rate-Based Stepwise Variable Selection in Discriminant Analysis , 2000, SIAM J. Sci. Comput..

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[18]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[19]  Joaquín A. Pacheco,et al.  Analysis of new variable selection methods for discriminant analysis , 2006, Comput. Stat. Data Anal..

[20]  V. K. Jayaraman,et al.  An SVM classifier incorporating simultaneous noise reduction and feature selection: illustrative case examples , 2005, Pattern Recognit..

[21]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[23]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[26]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[27]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.