Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees

BackgroundIn vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression.ResultsWe develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure.ConclusionOur approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

[1]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[2]  Ivan Ovcharenko,et al.  Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[3]  Monte Westerfield,et al.  The Zebrafish Information Network: the zebrafish model organism database , 2005, Nucleic Acids Res..

[4]  Mathieu Blanchette,et al.  Genome-wide orchestration of cardiac functions by the orphan nuclear receptors ERRalpha and gamma. , 2007, Cell metabolism.

[5]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[6]  V. Stewart,et al.  Embryonic lethality in mice homozygous for a targeted disruption of the N-myc gene. , 1992, Genes & development.

[7]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  E. Davidson Genomic Regulatory Systems: Development and Evolution , 2005 .

[10]  R. Bronson,et al.  E2F4 is essential for normal erythrocyte maturation and neonatal viability. , 2000, Molecular cell.

[11]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[12]  Mathieu Blanchette,et al.  PReMod: a database of genome-wide mammalian cis-regulatory module predictions , 2006, Nucleic Acids Res..

[13]  O. Bernard,et al.  GATA-and SP1-binding sites are required for the full activity of the tissue-specific promoter of the tal-1 gene. , 1994, Oncogene.

[14]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[15]  Anthony A. Philippakis,et al.  ModuleFinder: A Tool for Computational Discovery of Cis Regulatory Modules , 2004, Pacific Symposium on Biocomputing.

[16]  Nir Friedman,et al.  From promoter sequence to expression: a probabilistic framework , 2002, RECOMB '02.

[17]  S. Orkin,et al.  CREB-binding protein cooperates with transcription factor GATA-1 and is required for erythroid differentiation. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[19]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[20]  Hitoshi Shimano,et al.  Cross-talk between peroxisome proliferator-activated receptor (PPAR) alpha and liver X receptor (LXR) in nutritional regulation of fatty acid metabolism. I. PPARs suppress sterol regulatory element binding protein-1c promoter through inhibition of LXR signaling. , 2003, Molecular endocrinology.

[21]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Shyam Prabhakar,et al.  Close sequence comparisons are sufficient to identify human cis-regulatory elements. , 2005, Genome research.

[23]  S. Orkin,et al.  An essential role in liver development for transcription factor XBP-1. , 2000, Genes & development.

[24]  William Stafford Noble,et al.  Searching for statistically significant regulatory modules , 2003, ECCB.

[25]  Matthew Loose,et al.  The roles of GATA-4, -5 and -6 in vertebrate heart development. , 2005, Seminars in cell & developmental biology.

[26]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[27]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[28]  N. Mitro,et al.  LXR (liver X receptor) and HNF-4 (hepatocyte nuclear factor-4): key regulators in reverse cholesterol transport. , 2004, Biochemical Society transactions.

[29]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[30]  B. Göttgens,et al.  Distinct Mechanisms Direct SCL/tal-1 Expression in Erythroid Cells and CD34 Positive Primitive Myeloid Cells* , 1997, The Journal of Biological Chemistry.

[31]  M. Elm,et al.  Androgen receptor in human liver: Characterization and quantitation in normal and diseased liver , 1994, Hepatology.

[32]  M Goodman,et al.  Analysis of linked human epsilon and gamma transgenes: effect of locus control region hypersensitive sites 2 and 3 or a distal YY1 mutation on stage-specific expression patterns. , 1999, Blood.

[33]  Neil Richards,et al.  Analysis of Linked Human ɛ and γ Transgenes: Effect of Locus Control Region Hypersensitive Sites 2 and 3 or a Distal YY1 Mutation on Stage-Specific Expression Patterns , 1999 .

[34]  Wyeth W. Wasserman,et al.  Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm , 2003, ISMB.

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  Francesca Chiaromonte,et al.  ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. , 2006, Genome research.

[37]  Hao Wang,et al.  Global regulation of erythroid gene expression by transcription factor GATA-1. , 2004, Blood.

[38]  M. Blanchette,et al.  Genome-wide Orchestration of Cardiac Functions by the Orphan Nuclear Receptors ERRα and γ , 2007 .