Feature selection algorithms to find strong genes

The cDNA microarray technology allows us to estimate the expression of thousands of genes of a given tissue. It is natural then to use such information to classify different cell states, like healthy or diseased, or one particular type of cancer or another. However, usually the number of microarray samples is very small and leads to a classification problem with only tens of samples and thousands of features. Recently, Kim et al. proposed to use a parameterized distribution based on the original sample set as a way to attenuate such difficulty. Genes that contribute to good classifiers in such setting are called strong. In this paper, we investigate how to use feature selection techniques to speed up the quest for strong genes. The idea is to use a feature selection algorithm to filter the gene set considered before the original strong feature technique, that is based on a combinatorial search. The filtering helps us to find very good strong gene sets, without resorting to super computers. We have tested several filter options and compared the strong genes obtained with the ones got by the original full combinatorial search.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[3]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[4]  Edward R. Dougherty,et al.  Is cross-validation better than resubstitution for ranking genes? , 2004, Bioinform..

[5]  E. Dougherty,et al.  Identification of combination gene sets for glioma classification. , 2002, Molecular cancer therapeutics.

[6]  Javed Khan,et al.  Gene expression profile in multiple sclerosis patients and healthy controls: identifying pathways relevant to disease. , 2003, Human molecular genetics.

[7]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[8]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[9]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[12]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[13]  Jun Luo,et al.  Looking Beyond Morphology: Cancer Gene Expression Profiling Using DNA Microarrays , 2003, Cancer investigation.

[14]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[15]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[16]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[17]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[18]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[19]  Ulisses Braga-Neto,et al.  Bolstered error estimation , 2004, Pattern Recognit..

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[22]  R Simon,et al.  Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data , 2003, British Journal of Cancer.

[23]  E. Dougherty,et al.  Identification of signature genes by microarray for acute myeloid leukemia without maturation and acute promyelocytic leukemia with t(15;17)(q22;q12)(PML/RARalpha). , 2003, International journal of oncology.