A model-free and stable gene selection in microarray data analysis

Microarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Detecting the most significantly differentially expressed genes under different conditions, or gene selection, has been a central focus for researchers. The gene selection problem becomes more difficult when the numbers of samples under different conditions vary significantly, or are unbalanced. A novel model-free and stable gene selection method is proposed in this paper, i.e., the method does not assume any statistical model on the gene expression data and it is not affected by the unbalanced samples. The method has been evaluated on two publicly available datasets, the leukemia dataset and the small round blue cell tumor dataset, where the experimental results showed that the proposed method is efficient and robust in identifying differentially expressed genes.

[1]  In-Beum Lee,et al.  New gene selection for classification of cancer subtype considering within-class variation , 2003 .

[2]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[3]  Carlo Di Bello,et al.  PCA disjoint models for multiclass cancer analysis using gene expression data , 2003, Bioinform..

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[6]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[7]  Jin Hyun Park,et al.  New gene selection method for classification of cancer subtypes considering within‐class variation , 2003, FEBS letters.

[8]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Li M Fu,et al.  Multi‐class cancer subtype classification based on gene expression signatures with reliability analysis , 2004, FEBS letters.

[11]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[12]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[13]  Stephen J. Roberts,et al.  A theoretical analysis of gene selection , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[14]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.