A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription

We have analyzed microarray data using a modeling approach based on the multivariate statistical method partial least squares (PLS) regression to identify genes with periodic fluctuations in expression levels coupled to the cell cycle in the budding yeast, Saccharomyces cerevisiae. PLS has major advantages for analyzing microarray data since it can model data sets with large numbers of variables and with few observations. A response model was derived describing the expression profile over time expected for periodically transcribed genes, and was used to identify budding yeast transcripts with similar profiles. PLS was then used to interpret the importance of the variables (genes) for the model, yielding a ranking list of how well the genes fitted the generated model. Application of an appropriate cutoff value, calculated from randomized data, allows the identification of genes whose expression appears to be synchronized with cell cycling. Our approach also provides information about the stage in the cell cycle where their transcription peaks. Three synchronized yeast cell microarray data sets were analyzed, both separately and combined. Cell cycle-coupled periodicity was suggested for 455 of the 6,178 transcripts monitored in the combined data set, at a significance level of 0.5%. Among the candidates, 85% of the known periodic transcripts were included. Analysis of the three data sets separately yielded similar ranking lists, showing that the method is robust.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Alison J. Burnham,et al.  LATENT VARIABLE MULTIVARIATE REGRESSION MODELING , 1999 .

[3]  I. Jolliffe Principal Component Analysis , 2002 .

[4]  E. Lander Array of hope , 1999, Nature Genetics.

[5]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[6]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[7]  Agnar Höskuldsson,et al.  Prediction Methods in Science and Technology.: Vol 1. Basic theory , 1996 .

[8]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. P. Fodor,et al.  Multiplexed biochemical assays with biological chips , 1993, Nature.

[10]  M. Eisen,et al.  Gene expression informatics —it's all in your mine , 1999, Nature Genetics.

[11]  Robert R. Klevecz,et al.  Dynamic architecture of the yeast cell cycle uncovered by wavelet decomposition of expression microarray data , 2000, Functional & Integrative Genomics.

[12]  Michael Costanzo,et al.  Regulation of Transcription at theSaccharomyces cerevisiae Start Transition by Stb1, a Swi6-Binding Protein , 1999, Molecular and Cellular Biology.

[13]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[14]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[16]  Javier Arroyo,et al.  A Novel Family of Cell Wall-Related Proteins Regulated Differently during the Yeast Life Cycle , 2000, Molecular and Cellular Biology.

[17]  J. Hoheisel,et al.  Correspondence analysis applied to microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[21]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[22]  L. Johnston,et al.  Overlapping and distinct roles of the duplicated yeast transcription factors Ace2p and Swi5p , 2001, Molecular microbiology.