论文信息 - Improving Stability of Feature Selection Methods

Improving Stability of Feature Selection Methods

An improper design of feature selection methods can often lead to incorrect conclusions. Moreover, it is not generally realised that functional values of the criterion guiding the search for the best feature set are random variables with some probability distribution. This contribution examines the influence of several estimation techniques on the consistency of the final result. We propose an entropy based measure which can assess the stability of feature selection methods with respect to perturbations in the data. Results show that filters achieve a better stability and performance if more samples are employed for the estimation, i.e., using leave-one-out cross-validation, for instance. However, the best results for wrappers are acquired with the 50/50 holdout validation.

[1] Larry A. Rendell,et al. The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[2] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3] Josef Kittler,et al. Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[4] Ludmila I. Kuncheva,et al. A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[5] P. Cunningham,et al. Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection , 2002 .

[6] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[7] Anil K. Jain,et al. Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.

[10] B. Efron. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[11] Melanie Hilario,et al. Stability of feature selection algorithms , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12] Josef Kittler,et al. Pattern recognition : a statistical approach , 1982 .

[13] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .