Advanced significance analysis of microarray data based on weighted resampling: a comparative study and application to gene deletions in Mycobacterium bovis

MOTIVATION When analyzing microarray data, non-biological variation introduces uncertainty in the analysis and interpretation. In this paper we focus on the validation of significant differences in gene expression levels, or normalized channel intensity levels with respect to different experimental conditions and with replicated measurements. A myriad of methods have been proposed to study differences in gene expression levels and to assign significance values as a measure of confidence. In this paper we compare several methods, including SAM, regularized t-test, mixture modeling, Wilk's lambda score and variance stabilization. From this comparison we developed a weighted resampling approach and applied it to gene deletions in Mycobacterium bovis. RESULTS We discuss the assumptions, model structure, computational complexity and applicability to microarray data. The results of our study justified the theoretical basis of the weighted resampling approach, which clearly outperforms the others.

[1]  T. Kepler,et al.  Normalization and analysis of DNA microarray data by self-consistency and local regression , 2002, Genome Biology.

[2]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[3]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[4]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[7]  B. Barrell,et al.  Genomics of Mycobacterium bovis. , 2001, Tuberculosis.

[8]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  B. Barrell,et al.  Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity. , 2001, Genome research.

[11]  W. Rudin Principles of mathematical analysis , 1964 .

[12]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Tibshirani Estimating Transformations for Regression via Additivity and Variance Stabilization , 1988 .

[14]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[15]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[16]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Robert Tibshirani,et al.  Microarrays and Their Use in a Comparative Experiment , 2000 .

[18]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[19]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[20]  Robert Tibshirani,et al.  SAM “Significance Analysis of Microarrays” Users guide and technical document , 2002 .