MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES

A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through different mechanisms. It is useful, therefore, to define a core set of independent features for the expression states that allow them to be compared directly. Principal components analysis (PCA) is a statistical technique for determining the key variables in a multidimensional data set that explain the differences in the observations, and can be used to simplify the analysis and visualization of multidimensional data sets. We show that application of PCA to expression data (where the experimental conditions are the variables, and the gene expression measurements are the observations) allows us to summarize the ways in which gene responses vary under different conditions. Examination of the components also provides insight into the underlying factors that are measured in the experiments. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components--i.e. 2 variables capture most of the information. These components appear to represent (1) overall induction level and (2) change in induction level over time. We also examined the clusters proposed in the original paper, and show how they are manifested in principal component space. Our results are available on the internet at http:¿www.smi.stanford.edu/project/helix/PCArray .

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  Brian Everitt,et al.  Cluster analysis , 1974 .

[3]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[4]  L. Xu,et al.  NDT80, a meiosis-specific gene required for exit from pachytene in Saccharomyces cerevisiae , 1995, Molecular and cellular biology.

[5]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[6]  J. E. Jackson,et al.  Statistical Factor Analysis and Related Methods: Theory and Applications , 1995 .

[7]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[8]  J. A. Calvin,et al.  Developmental expression of morphoregulatory genes in the mouse embryo: an analytical approach using a novel technology. , 1997, Biochemical and molecular medicine.

[9]  J. Vohradský,et al.  Identification of procaryotic developmental stages by statistical analyses of two‐dimensional gel patterns , 1997, Electrophoresis.

[10]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S H Lai,et al.  Novel local PCA-based method for detecting activation signals in fMRI , 1998, Medical Imaging.

[12]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[13]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[14]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[15]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[16]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  J. J. Chen,et al.  Profiling expression patterns and isolating differentially expressed genes by cDNA microarray system with colorimetry detection. , 1998, Genomics.

[18]  A. Dunker The pacific symposium on biocomputing , 1998 .

[19]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[21]  R. Franklin,et al.  Characterization of microbial communities using randomly amplified polymorphic DNA (RAPD). , 1999, Journal of microbiological methods.

[22]  S. Hilsenbeck,et al.  Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. , 1999, Journal of the National Cancer Institute.

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[25]  E. Lander Array of hope , 1999, Nature Genetics.

[26]  Eric R. Ziegel,et al.  Applied Multivariate Data Analysis , 2002, Technometrics.