Probabilistic discovery of overlapping cellular processes and their regulation

Many of the functions carried out by a living cell are regulated at the transcriptional level, to ensure that genes are expressed when they are needed. Thus, to understand biological processes, it is thus necessary to understand the cell's transcriptional network. In this paper, we propose a novel probabilistic model of gene regulation for the task of identifying overlapping biological processes and the regulatory mechanism controlling their activation. A key feature of our approach is that we allow genes to participate in multiple processes, thus providing a more biologically plausible model for the process of gene regulation. We present an algorithm to learn this model automatically from data, using only genome-wide measurements of gene expression as input. We compare our results to those obtained by other approaches, and show significant benefits can be gained by modeling both the organization of genes into overlapping cellular processes and the regulatory programs of these processes. Moreover, our method successfully grouped genes known to function together, recovered many regulatory relationships that are known in the literature, and suggested novel hypotheses regarding the regulatory role of previously uncharacterized proteins.

[1]  A. Vershon,et al.  Interactions of the Mcm1 MADS Box Protein with Cofactors That Regulate Mating in Yeast , 2002, Molecular and Cellular Biology.

[2]  Åke Björck,et al.  Numerical Methods , 1995, Handbook of Marine Craft Hydrodynamics and Motion Control.

[3]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[4]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[5]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[6]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[7]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[8]  T. Kunoh,et al.  Positive regulation of transcription of homeoprotein-encoding YHP1 by the two-component regulator Sln1 in Saccharomyces cerevisiae. , 2000, Biochemical and biophysical research communications.

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Nir Friedman,et al.  From promoter sequence to expression: a probabilistic framework , 2002, RECOMB '02.

[13]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[14]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[15]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[18]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[19]  N. Friedman,et al.  大規模データセットからのBayesianネットワーク構造の学習:「Sparse Candidate」アルゴリズム , 1999 .

[20]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[21]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[22]  L. Johnston,et al.  Rme1, which controls CLN2 expression in Saccharomyces cerevisiae, is a nuclear protein that is cell cycle regulated , 2001, Molecular Genetics and Genomics.

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[26]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[27]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[28]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[29]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[30]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[31]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[32]  T. Kunoh,et al.  YHP1 encodes a new homeoprotein that binds to the IME1 promoter in Saccharomyces cerevisiae , 2000, Yeast.

[33]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[34]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.