Identifying cycling genes by combining sequence homology and expression data

MOTIVATION The expression of genes during the cell division process has now been studied in many different species. An important goal of these studies is to identify the set of cycling genes. To date, this was done independently for each of the species studied. Due to noise and other data analysis problems, accurately deriving a set of cycling genes from expression data is a hard problem. This is especially true for some of the multicellular organisms, including humans. RESULTS Here we present the first algorithm that combines microarray expression data from multiple species for identifying cycling genes. Our algorithm represents genes from multiple species as nodes in a graph. Edges between genes represent sequence similarity. Starting with the measured expression values for each species we use Belief Propagation to determine a posterior score for genes. This posterior is used to determine a new set of cycling genes for each species. We applied our algorithm to improve the identification of the set of cell cycle genes in budding yeast and humans. As we show, by incorporating sequence similarity information we were able to obtain a more accurate set of genes compared to methods that rely on expression data alone. Our method was especially successful for the human dataset indicating that it can use a high quality dataset from one species to overcome noise problems in another. AVAILABILITY C implementation is available from the supporting website: http://www.cs.cmu.edu/~lyongu/pub/cellcycle/.

[1]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Laskey,et al.  MCM3 complex required for cell cycle regulation of DNA replication in vertebrate cells , 1995, Nature.

[3]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[4]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[5]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[6]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[7]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[8]  Wilhelm Gruissem,et al.  Cell Cycle-regulated Gene Expression inArabidopsis * , 2002, The Journal of Biological Chemistry.

[9]  Tommi S. Jaakkola,et al.  Physical Network Models , 2004, J. Comput. Biol..

[10]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[11]  Zhaohui S. Qin,et al.  Statistical resynchronization and Bayesian detection of periodically expressed genes. , 2004, Nucleic acids research.

[12]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  David E. Levin,et al.  Cell Wall Integrity Signaling in Saccharomyces cerevisiae , 2005, Microbiology and Molecular Biology Reviews.

[15]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[16]  Ziv Bar-Joseph,et al.  Deconvolving cell cycle expression data with complementary information , 2004, ISMB 2004.

[17]  Gudrun Wacker Similarities and Differences , 2005 .

[18]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[19]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[20]  S. Bergmann,et al.  Similarities and Differences in Genome-Wide Expression Data of Six Organisms , 2003, PLoS biology.

[21]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[22]  K. Shedden,et al.  Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[24]  H. McAdams,et al.  Global analysis of the genetic network controlling a bacterial cell cycle. , 2000, Science.

[25]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.

[26]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[27]  G. Tsujimoto,et al.  Identification of a Novel MCM3-associated Protein that Facilitates MCM3 Nuclear Localization* , 1998, The Journal of Biological Chemistry.