A Novel Recursive Feature Subset Selection Algorithm

Univariate filter methods, which rank single genes according to how well they each separate the classes, are widely used for gene ranking in the field of microarray analysis of gene expression datasets. These methods rank all of the genes by considering all of the samples; however some of these samples may never be classified correctly by adding new genes and these methods keep adding redundant genes covering only some parts of the space and finally the returned subset of genes may never cover the space perfectly. In this paper we introduce a new gene subset selection approach which aims to add genes covering the space which has not been covered by already selected genes in a recursive fashion. Our approach leads to significant improvement on many different benchmark datasets. Keywords-gene selection; filter methods; gene expression; microarray; ranking functions.

[1]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[3]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[4]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[10]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[11]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.