论文信息 - Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

Feature selection often aims to select a compact feature subset to build a pattern classifier with reduced complexity, so as to achieve improved classification performance. From the perspective of pattern analysis, producing stable or robust solution is also a desired property of a feature selection algorithm. However, the issue of robustness is often overlooked in feature selection. In this study, we analyze the robustness issue existing in feature selection for high-dimensional and small-sized gene-expression data, and propose to improve robustness of feature selection algorithm by using multiple feature selection evaluation criteria. Based on this idea, a multicriterion fusion-based recursive feature elimination (MCF-RFE) algorithm is developed with the goal of improving both classification performance and stability of feature selection results. Experimental studies on five gene-expression data sets show that the MCF-RFE algorithm outperforms the commonly used benchmark feature selection algorithm SVM-RFE.

Feng Yang | Kezhi Mao | K. Mao | Feng Yang

[1] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[3] T. Poggio,et al. Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[4] S. Merler,et al. Semisupervised learning for molecular profiling , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5] Thomas G. Dietterich. Machine-Learning Research Four Current Directions , 1997 .

[6] E. Lander,et al. Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[7] Edward R. Dougherty,et al. Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[8] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[9] Lambert Schomaker,et al. Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[10] Chris H. Q. Ding,et al. Stable feature selection via dense feature groups , 2008, KDD.

[11] Yanqing Zhang,et al. Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[13] Tao Qin,et al. Supervised rank aggregation , 2007, WWW '07.

[14] Nello Cristianini,et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[15] R Kahavi,et al. Wrapper for feature subset selection , 1997 .

[16] Xin Zhou,et al. The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms , 2006, Bioinform..

[17] Lei Yu,et al. Stable and Accurate Feature Selection , 2009, ECML/PKDD.

[18] Michael R Chernick,et al. Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[19] D. Frank Hsu,et al. Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[20] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21] Chris H. Q. Ding,et al. Consensus group stable feature selection , 2009, KDD.

[22] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[23] Yvan Saeys,et al. Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[24] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[25] Zixiang Xiong,et al. Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..

[26] Josef Kittler,et al. Improving Stability of Feature Selection Methods , 2007, CAIP.

[27] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28] D. V. Sridhar,et al. Information theoretic subset selection for neural network models , 1998 .

[29] Nor Hayati Othman,et al. A review of feature selection techniques via gene expression profiles , 2008, 2008 International Symposium on Information Technology.

[30] Pedro Larrañaga,et al. A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[31] Melanie Hilario,et al. Knowledge and Information Systems , 2007 .

[32] Tommy W. S. Chow,et al. Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33] C. Dwork,et al. Rank Aggregation Revisited , 2002 .

[34] Jaepil Ko,et al. Dynamic Classifier Integration Method , 2005, Multiple Classifier Systems.

[35] Edward R. Dougherty,et al. The peaking phenomenon in the presence of feature-selection , 2008, Pattern Recognit. Lett..

[36] Dan Roth,et al. An Unsupervised Learning Algorithm for Rank Aggregation , 2007, ECML.

[37] Todd,et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[38] G. Bontempi,et al. A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39] Kagan Tumer,et al. Theoretical Foundations Of Linear And Order Statistics Combiners For Neural Pattern Classifiers , 1995 .

[40] Yukyee Leung,et al. A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41] Marko Robnik-Sikonja,et al. Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[42] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[43] T. Chow,et al. Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques , 2007, TCBB.

[44] Jana Novovicová,et al. Evaluating the Stability of Feature Selectors That Optimize Feature Subset Cardinality , 2008, SSPR/SPR.