Computational discovery of transcriptional regulatory rules

MOTIVATION Even in a simple organism like yeast Saccharomyces cerevisiae, transcription is an extremely complex process. The expression of sets of genes can be turned on or off by the binding of specific transcription factors to the promoter regions of genes. Experimental and computational approaches have been proposed to establish mappings of DNA-binding locations of transcription factors. However, although location data obtained from experimental methods are noisy owing to imperfections in the measuring methods, computational approaches suffer from over-prediction problems owing to the short length of the sequence motifs bound by the transcription factors. Also, these interactions are usually environment-dependent: many regulators only bind to the promoter region of genes under specific environmental conditions. Even more, the presence of regulators at a promoter region indicates binding but not necessarily function: the regulator may act positively, negatively or not act at all. Therefore, identifying true and functional interactions between transcription factors and genes in specific environment conditions and describing the relationship between them are still open problems. RESULTS We developed a method that combines expression data with genomic location information to discover (1) relevant transcription factors from the set of potential transcription factors of a target gene; and (2) the relationship between the expression behavior of a target gene and that of its relevant transcription factors. Our method is based on rule induction, a machine learning technique that can efficiently deal with noisy domains. When applied to genomic location data with a confidence criterion relaxed to P-value = 0.005, and three different expression datasets of yeast S.cerevisiae, we obtained a set of regulatory rules describing the relationship between the expression behavior of a specific target gene and that of its relevant transcription factors. The resulting rules provide strong evidence of true positive gene-regulator interactions, as well as of protein-protein interactions that could serve to identify transcription complexes. AVAILABILITY Supplementary files are available from http://www.jaist.ac.jp/~h-pham/regulatory-rules

[1]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[2]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[3]  K. Nasmyth,et al.  A role for the transcription factors Mbp1 and Swi4 in progression from G1 to S phase. , 1993, Science.

[4]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[5]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[6]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[7]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[8]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[9]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[10]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[11]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[12]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[13]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[14]  E. Serra,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association , 2001, Nature Genetics.

[15]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[16]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[17]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[18]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[19]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[20]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[21]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[22]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[23]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[24]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[25]  Curt Wittenberg,et al.  Cln3 Activates G1-Specific Transcription via Phosphorylation of the SBF Bound Repressor Whi5 , 2004, Cell.

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[28]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[29]  Tu Bao Ho,et al.  Mining yeast transcriptional regulatory modules from factor DNA-binding sites and gene expression data. , 2004, Genome informatics. International Conference on Genome Informatics.