A bi-dimensional regression tree approach to the modeling of gene expression regulation

MOTIVATION The transcriptional regulation of a gene depends on the binding of cis-regulatory elements on its promoter to some transcription factors and the expression levels of the transcription factors. Most existing approaches to studying transcriptional regulation model these dependencies separately, i.e. either from promoters to gene expression or from the expression levels of transcription factors to the expression levels of genes. Little effort has been devoted to a single model for integrating both dependencies. RESULTS We propose a novel method to model gene expression using both promoter sequences and the expression levels of putative regulators. The proposed method, called bi-dimensional regression tree (BDTree), extends a multivariate regression tree approach by applying it simultaneously to both genes and conditions of an expression matrix. The method produces hypotheses about the condition-specific binding motifs and regulators for each gene. As a side-product, the method also partitions the expression matrix into small submatrices in a way similar to bi-clustering. We propose and compare several splitting functions for building the tree. When applied to two microarray datasets of the yeast Saccharomyces cerevisiae, BDTree successfully identifies most motifs and regulators that are known to regulate the biological processes underlying the datasets. Comparing with an existing algorithm, BDTree provides a higher prediction accuracy in cross-validations.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[3]  Kara Dolinski,et al.  Saccharomyces genome database: Underlying principles and organisation , 2004, Briefings Bioinform..

[4]  Yoav Freund,et al.  Predicting genetic regulatory response using classification , 2004, ISMB/ECCB.

[5]  D. Shore,et al.  Growth-regulated recruitment of the essential yeast ribosomal protein gene activator Ifh1 , 2004, Nature.

[6]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[7]  Michael B. Eisen,et al.  Identification of regulatory elements using a feature selection method , 2002, Bioinform..

[8]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[9]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[10]  Albert Goldbeter,et al.  Oscillatory nucleocytoplasmic shuttling of the general stress response transcriptional activators Msn2 and Msn4 in Saccharomyces cerevisiae , 2003, The Journal of cell biology.

[11]  Doheon Lee,et al.  Regression trees for regulatory element identification , 2004, Bioinform..

[12]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[13]  K. Struhl,et al.  The transcription factor Ifh1 is a key regulator of yeast ribosomal protein genes , 2004, Nature.

[14]  Weixiong Zhang,et al.  WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar , 2005, Nucleic Acids Res..

[15]  A. Brazma,et al.  Towards reconstruction of gene networks from expression data by supervised learning , 2003, Genome Biology.

[16]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[17]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[18]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[19]  Yuh-Jyh Hu,et al.  Combinatorial motif analysis and hypothesis generation on a genomic scale , 2000, Bioinform..

[20]  T. Hughes,et al.  Genome-Wide Analysis of mRNA Stability Using Transcription Inhibitors and Microarrays Reveals Posttranscriptional Control of Ribosome Biogenesis Factors , 2004, Molecular and Cellular Biology.

[21]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Michael Q. Zhang,et al.  Identifying cooperativity among transcription factors controlling the cell cycle in yeast. , 2003, Nucleic acids research.

[23]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[24]  Weixiong Zhang,et al.  Discovering Transcriptional Regulatory Rules from Gene Expression and TF-DNA Binding Data by Decision Tree Learning , 2004 .

[25]  S. Hohmann,et al.  Regulation of genes encoding subunits of the trehalose synthase complex inSaccharomyces cerevisiae: novel variations of STRE-mediated transcription control? , 1996, Molecular and General Genetics MGG.

[26]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[27]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[28]  G. Church,et al.  Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. , 2002, Genome research.

[29]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[30]  M. Segal Tree-Structured Methods for Longitudinal Data , 1992 .

[31]  G. Carman,et al.  Regulation of the Yeast DPP1-encoded Diacylglycerol Pyrophosphate Phosphatase by Transcription Factor Gis1p* , 2003, Journal of Biological Chemistry.