Analysis of variance components in gene expression data

MOTIVATION A microarray experiment is a multi-step process, and each step is a potential source of variation. There are two major sources of variation: biological variation and technical variation. This study presents a variance-components approach to investigating animal-to-animal, between-array, within-array and day-to-day variations for two data sets. The first data set involved estimation of technical variances for pooled control and pooled treated RNA samples. The variance components included between-array, and two nested within-array variances: between-section (the upper- and lower-sections of the array are replicates) and within-section (two adjacent spots of the same gene are printed within each section). The second experiment was conducted on four different weeks. Each week there were reference and test samples with a dye-flip replicate in two hybridization days. The variance components included week-to-week, animal-to-animal and between-array and within-array variances. RESULTS We applied the linear mixed-effects model to quantify different sources of variation. In the first data set, we found that the between-array variance is greater than the between-section variance, which, in turn, is greater than the within-section variance. In the second data set, for the reference samples, the week-to-week variance is larger than the between-array variance, which, in turn, is slightly larger than the within-array variance. For the test samples, the week-to-week variance has the largest variation. The animal-to-animal variance is slightly larger than the between-array and within-array variances. However, in a gene-by-gene analysis, the animal-to-animal variance is smaller than the between-array variance in four out of five housekeeping genes. In summary, the largest variation observed is the week-to-week effect. Another important source of variability is the animal-to-animal variation. Finally, we describe the use of variance-component estimates to determine optimal numbers of animals, arrays per animal and sections per array in planning microarray experiments.

[1]  Karl J. Friston,et al.  Variance Components , 2003 .

[2]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[3]  Lorenz Wernisch Can Replication Save Noisy Microarray Data? , 2002, Comparative and functional genomics.

[4]  Xiangqin Cui,et al.  How Many Mice and How Many Arrays? Replication in Mouse cDNA Microarray Experiments , 2004 .

[5]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[6]  Hong Fang,et al.  Changes in expression level of genes as a function of time of day in the liver of rats. , 2004, Mutation research.

[7]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[8]  T. Hudson,et al.  Characterization of variability in large-scale gene expression data: implications for study design. , 2002, Genomics.

[9]  C. Kendziorski,et al.  The efficiency of pooling mRNA in microarray experiments. , 2003, Biostatistics.

[10]  Chen-An Tsai,et al.  Testing for differentially expressed genes with microarray data. , 2003, Nucleic acids research.

[11]  B. Weir,et al.  Assessing sources of variability in microarray gene expression data. , 2002, BioTechniques.

[12]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[13]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.