论文信息 - Ensemble feature selection for high dimensional data: a new method and a comparative study

Ensemble feature selection for high dimensional data: a new method and a comparative study

The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.

Mohamed Limam | Afef Ben Brahim | M. Limam

[1] Thibault Helleputte,et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[2] Lawrence Mitchell,et al. Parallel classification and feature selection in microarray data using SPRINT , 2014, Concurr. Comput. Pract. Exp..

[3] Oleg Okun. Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations , 2011 .

[4] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5] Ash A. Alizadeh,et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[6] Torben F. Ørntoft,et al. Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[7] S. Ramaswamy,et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[8] Domenec Puig,et al. Robust Aggregation of Expert Opinions Based on Conflict Analysis and Resolution , 2003, CAEPIA.

[9] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[10] Ludmila I. Kuncheva,et al. A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[11] Luigi Fratta,et al. Melusin, a muscle-specific integrin β1–interacting protein, is required to prevent cardiac failure in response to chronic pressure overload , 2003, Nature Medicine.