Micro Array Based Gene Expression Analysis using Parametric Multivariate Tests per Gene - A Generalized Application of Multiple Procedures with Data-driven Order of Hypotheses

Summary Micro array technology allows the simultaneous analysis of ten-thousands of genes. Most often, how- ever, the analysis is based on a few replications only. This causes problems in the application of classi- cal multivariate tests which require sample sizes exceeding the number of observed variables. To over- come these problems, a class of stable, multivariate procedures based on the theory of spherical distributions has been proposed by Luter, Glimm, and Kropf (1996). These methods allow the use of multivariate information of many genes for testing differential gene expression. Furthermore, multiple testing procedures based on these principles have been constructed (e.g., Kropf, Luter, 2002), which strictly keep the familywise type I error rate (FWE). In this paper, these methods have been generalized to allow for the use of full multivariate informa- tion on expression intensities of individual genes analysed by the Affymetrix GeneChip technology. In contrast to the usual strategy, which constructs an expression score for each gene, based on averaging of the different oligonucleotide (perfect- and miss-match) information, and then performs some test on these summarized expression values, we suggest using a test procedure based on the complete multi- variate perfect match information. We show that a multiple FWE-controlling procedure for normally distributed data proposed by Westfall, Kropf, and Finos (2004), can be generalised to a more powerful procedure based on left-spherically distributed scores derived from the perfect match information, with- out losing the FWE-controlling property. To illustrate the proposed test procedures, which have been implemented in the statistical program- ming environment R, we analyse two already published data sets, comparing gene expression of tumour and healthy tissues within identical patients and between two groups of different patients, respectively. Using these examples, we demonstrated that the incorporation of the multivariate perfect match infor- mation is superior to classical expression score based methods with respect to the number of identifi- able differentially expressed genes.

[1]  S. Kropf,et al.  Multivariate tests based on left-spherically distributed linear scores , 1998 .

[2]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[3]  N. Patil,et al.  DNA hybridization to mismatched templates: a chip study. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[5]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[6]  Jurgen Lauter,et al.  Exact t and F Tests for Analyzing Studies with Multiple Endpoints , 1996 .

[7]  P. Westfall,et al.  Optimally weighted, fixed sequence and gatekeeper multiple testing procedures , 2001 .

[8]  P. Stadler,et al.  Sensitivity of Microarray Oligonucleotide Probes: Variability and Effect of Base Composition , 2004 .

[9]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Felix Naef,et al.  Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays. , 2003, Nucleic acids research.

[11]  R. Paschke,et al.  Complementary DNA expression array analysis suggests a lower expression of signal transduction proteins and receptors in cold and hot thyroid nodules. , 2001, The Journal of clinical endocrinology and metabolism.

[12]  S. Kropf,et al.  Multiple Tests for Different Sets of Variables Using a Data‐Driven Ordering of Hypotheses, with an Application to Gene Expression Data , 2002 .

[13]  Jürgen Läuter,et al.  New multivariate tests for data with an inherent structure , 1996 .

[14]  Mikhail Nikulin,et al.  Statistical planning and inference in accelerated life testing using the CHSS model , 2004 .