Increased power of microarray analysis by use of an algorithm based on a multivariate procedure

MOTIVATION The power of microarray analyses to detect differential gene expression strongly depends on the statistical and bioinformatical approaches used for data analysis. Moreover, the simultaneous testing of tens of thousands of genes for differential expression raises the 'multiple testing problem', increasing the probability of obtaining false positive test results. To achieve more reliable results, it is, therefore, necessary to apply adjustment procedures to restrict the family-wise type I error rate (FWE) or the false discovery rate. However, for the biologist the statistical power of such procedures often remains abstract, unless validated by an alternative experimental approach. RESULTS In the present study, we discuss a multiplicity adjustment procedure applied to classical univariate as well as to recently proposed multivariate gene-expression scores. All procedures strictly control the FWE. We demonstrate that the use of multivariate scores leads to a more efficient identification of differentially expressed genes than the widely used MAS5 approach provided by the Affymetrix software tools (Affymetrix Microarray Suite 5 or GeneChip Operating Software). The practical importance of this finding is successfully validated using real time quantitative PCR and data from spike-in experiments. AVAILABILITY The R-code of the statistical routines can be obtained from the corresponding author. CONTACT Schuster@imise.uni-leipzig.de

[1]  R. Paschke,et al.  Gene expression analysis reveals evidence for inactivation of the TGF-β signaling cascade in autonomously functioning thyroid nodules , 2004, Oncogene.

[2]  R. Paschke,et al.  Complementary DNA expression array analysis suggests a lower expression of signal transduction proteins and receptors in cold and hot thyroid nodules. , 2001, The Journal of clinical endocrinology and metabolism.

[3]  Jürgen Läuter,et al.  New multivariate tests for data with an inherent structure , 1996 .

[4]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Diane Gershon,et al.  Microarray technology: An array of opportunities , 2002, Nature.

[6]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[7]  S. Kropf,et al.  Multivariate tests based on left-spherically distributed linear scores , 1998 .

[8]  Frank Bretz,et al.  Recent developments in multiple comparison procedures , 2004 .

[9]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[12]  P. Westfall,et al.  Weighted FWE-controlling methods in high-dimensional situations , 2004 .

[13]  C. Rosenow,et al.  Monitoring gene expression using DNA microarrays. , 2000, Current opinion in microbiology.

[14]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[15]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[16]  S. Kropf,et al.  Multiple Tests for Different Sets of Variables Using a Data‐Driven Ordering of Hypotheses, with an Application to Gene Expression Data , 2002 .

[17]  Ingo Roeder,et al.  Micro Array Based Gene Expression Analysis using Parametric Multivariate Tests per Gene - A Generalized Application of Multiple Procedures with Data-driven Order of Hypotheses , 2004 .