Evaluation Models for the Effect of Sample Imbalance on Gene Selection

In this paper, we considered the problem of sample imbalance in the context of gene selection. Based on simple random sampling, two evaluation models were proposed to investigate the effect of sample imbalance on gene selection. Under the proposed evaluation models, the performances of five famous gene selection methods on the unbalanced data were compared. The experimental results indicated that the proposed evaluation models are effective and the sample imbalance has a great influence on gene selection. Our findings provide some guidelines in the design of microarray experiments and the following data analysis, and two evaluation models are suitable for selecting feasible gene selection method to identify differential expression genes

[1]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[2]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Stephen J. Roberts,et al.  A theoretical analysis of gene selection , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[4]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[5]  A. Winsor Sampling techniques. , 2000, Nursing times.

[6]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[7]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[8]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[9]  Emanuel F. Petricoin,et al.  Medical applications of microarray technologies: a regulatory science perspective , 2002, Nature Genetics.

[10]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[13]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[15]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[16]  R. H. Myers,et al.  Probability and Statistics for Engineers and Scientists , 1978 .

[17]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Lawrence Hunter,et al.  GEST: a gene expression search tool based on a novel Bayesian similarity metric , 2001, ISMB.