论文信息 - A Model-Based Relevance Estimation Approach for Feature Selection in Microarray Datasets

A Model-Based Relevance Estimation Approach for Feature Selection in Microarray Datasets

The paper presents an original model-based approach for feature selection and its application to classification of microarray datasets. Model-based approaches to feature selection are generally denoted as wrappers. Wrapper methods assess subsets of variables according to their usefulness to a given prediction model which will be eventually used for classification. This strategy assumes that the accuracy of the model used for the wrapper selection is a good estimator of the relevance of the feature subset. We first discuss the limits of this assumption by showing that the assessment of a subset by means of a generic learner (e.g. by cross-validation) returns a biased estimate of the relevance of the subset itself. Secondly, we propose a low-bias estimator of the relevance based on the cross-validation assessment of an unbiased learner. Third, we assess a feature selection approach which combines the low-bias relevance estimator with state-of-the-art relevance estimators in order to enhance their accuracy. The experimental validation on 20 publicly available cancer expression datasets shows the robustness of a selection approach which is not biased by a specific learner.

Gianluca Bontempi | Patrick E. Meyer | Gianluca Bontempi

[1] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[2] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3] Gianluca Bontempi,et al. On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[4] Ron Kohavi,et al. Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[5] Fuhui Long,et al. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] 中澤真,et al. Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[7] S. Dudoit,et al. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[8] Volker Tresp,et al. Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[9] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[10] D. Botstein,et al. Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11] Satosi Watanabe,et al. Methodologies of Pattern Recognition , 1969 .

[12] T. Cover. LEARNING IN PATTERN RECOGNITION , 1969 .

[13] Debashis Ghosh,et al. Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[14] Mauro Birattari,et al. Lazy Learning Meets the Recursive Least Squares Algorithm , 1998, NIPS.

[15] David A. Bell,et al. A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[16] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[17] Pedro Larrañaga,et al. A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[18] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[20] Keinosuke Fukunaga,et al. Nonparametric Bayes error estimation using unclassified samples , 1972, IEEE Trans. Inf. Theory.

[21] Tao Li,et al. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[22] Geoffrey J McLachlan,et al. Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24] Huan Liu,et al. Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[25] Rolf Drechsler,et al. Applications of Evolutionary Computing, EvoWorkshops 2008: EvoCOMNET, EvoFIN, EvoHOT, EvoIASP, EvoMUSART, EvoNUM, EvoSTOC, and EvoTransLog, Naples, Italy, March 26-28, 2008. Proceedings , 2008, EvoWorkshops.