Characterization of variability in large-scale gene expression data: implications for study design.

Large-scale gene expression measurement techniques provide a unique opportunity to gain insight into biological processes under normal and pathological conditions. To interpret the changes in expression profiles for thousands of genes, we face the nontrivial problem of understanding the significance of these changes. In practice, the sources of background variability in expression data can be divided into three categories: technical, physiological, and sampling. To assess the relative importance of these sources of background variation, we generated replicate gene expression profiles on high-density Affymetrix GeneChip oligonucleotide arrays, using either identical RNA samples or RNA samples obtained under similar biological states. We derived a novel measure of dispersion in two-way comparisons, using a linear characteristic function. When comparing expression profiles from replicate tests using the same RNA sample (a test for technical variability), we observed a level of dispersion similar to the pattern obtained with RNA samples from replicate cultures of the same cell line (a test for physiological variability). On the other hand, a higher level of dispersion was observed when tissue samples of different animals were compared (an example of sampling variability). This implies that, in experiments in which samples from different subjects are used, the variation induced by the stimulus may be masked by non-stimuli-related differences in the subjects' biological state. These analyses underscore the need for replica experiments to reliably interpret large-scale expression data sets, even with simple microarray experiments.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Claverie Computational methods for the identification of differential and coordinated gene expression. , 1999, Human molecular genetics.

[4]  Y. Halvorsen,et al.  Analysis of the pattern of gene expression during human adipogenesis by DNA microarray , 1999 .

[5]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Bittner,et al.  Fluorescent cDNA microarray hybridization reveals complexity and heterogeneity of cellular genotoxic stress responses , 1999, Oncogene.

[7]  E. Lander,et al.  Expression analysis with oligonucleotide microarrays reveals that MYC regulates genes involved in growth, cell cycle, signaling, and adhesion. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[9]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[10]  J. Hudson,et al.  Development of a prostate cDNA microarray and statistical gene expression analysis package , 2000, Molecular carcinogenesis.

[11]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[12]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  M. Schena,et al.  Overview of DNA chip technology , 1999, Molecular breeding.