A mutation degree model for the identification of transcriptional regulatory elements

BackgroundCurrent approaches for identifying transcriptional regulatory elements are mainly via the combination of two properties, the evolutionary conservation and the overrepresentation of functional elements in the promoters of co-regulated genes. Despite the development of many motif detection algorithms, the discovery of conserved motifs in a wide range of phylogenetically related promoters is still a challenge, especially for the short motifs embedded in distantly related gene promoters or very closely related promoters, or in the situation that there are not enough orthologous genes available.ResultsA mutation degree model is proposed and a new word counting method is developed for the identification of transcriptional regulatory elements from a set of co-expressed genes. The new method comprises two parts: 1) identifying overrepresented oligo-nucleotides in promoters of co-expressed genes, 2) estimating the conservation of the oligo-nucleotides in promoters of phylogenetically related genes by the mutation degree model. Compared with the performance of other algorithms, our method shows the advantages of low false positive rate and higher specificity, especially the robustness to noisy data. Applying the method to co-expressed gene sets from Arabidopsis, most of known cis-elements were successfully detected. The tool and example are available at http://mcube.nju.edu.cn/jwang/lab/soft/ocw/OCW.html.ConclusionsThe mutation degree model proposed in this paper is adapted to phylogenetic data of different qualities, and to a wide range of evolutionary distances. The new word-counting method based on this model has the advantage of better performance in detecting short sequence of cis-elements from co-expressed genes of eukaryotes and is robust to less complete phylogenetic data.

[1]  Anna El'skaya,et al.  COTRASIF: conservation-aided transcription-factor-binding site finder , 2009, Nucleic acids research.

[2]  Sunchung Park,et al.  Journal of Experimental Botany, Page 1 of 14 DOI: 10.1093/jxb/erg304 , 2003 .

[3]  Gábor Tóth,et al.  DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants , 2004, Nucleic Acids Res..

[4]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[5]  Bernd Hamann,et al.  Phylo-VISTA: interactive visualization of multiple DNA sequence alignments , 2004, Bioinform..

[6]  Mathieu Blanchette,et al.  Motif Discovery in Heterogeneous Sequence Data , 2003, Pacific Symposium on Biocomputing.

[7]  Jesse R. Raab,et al.  Insulators and promoters: closer than we think , 2010, Nature Reviews Genetics.

[8]  Jin Wang,et al.  Enrichment of transcriptional regulatory sites in non-coding genomic region , 2004, Bioinform..

[9]  Saurabh Sinha PhyME: a software tool for finding motifs in sets of orthologous sequences. , 2007, Methods in molecular biology.

[10]  Hongyu Zhao,et al.  An Arabidopsis Promoter Microarray and its Initial Usage in the Identification of HY5 Binding Targets in Vitro , 2004, Plant Molecular Biology.

[11]  O. Hobert Gene Regulation by Transcription Factors and MicroRNAs , 2008, Science.

[12]  C. Peterson,et al.  Tissue-specific regulatory network extractor (TS-REX): a database and software resource for the tissue and cell type-specific investigation of transcription factor-gene networks , 2009, Nucleic acids research.

[13]  B. De Moor,et al.  The Effect of Orthology and Coregulation on Detecting Regulatory Motifs , 2010, PloS one.

[14]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[15]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[16]  S. Kamauchi,et al.  Gene expression in response to endoplasmic reticulum stress in Arabidopsis thaliana , 2005, The FEBS journal.

[17]  Christina Boucher,et al.  A Graph Clustering Approach to Weak Motif Recognition , 2007, WABI.

[18]  Henry D. Priest,et al.  Cis-regulatory elements in plant cell signaling. , 2009, Current opinion in plant biology.

[19]  Li Yang,et al.  Large-Scale cis-Element Detection by Analysis of Correlated Expression and Sequence Conservation between Arabidopsis and Brassica oleracea1[W] , 2006, Plant Physiology.

[20]  Graziano Pesole,et al.  Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes , 2009, Nucleic Acids Res..

[21]  K. Akiyama,et al.  Monitoring the expression profiles of 7000 Arabidopsis genes under drought, cold and high-salinity stresses using a full-length cDNA microarray. , 2002, The Plant journal : for cell and molecular biology.

[22]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[23]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[24]  Huaiqiu Zhu,et al.  The Transcriptional Regulatory Mechanism of CYP72B1 and AUR3 in Response to Light, Auxin and Brassinosteroid*: The Transcriptional Regulatory Mechanism of CYP72B1 and AUR3 in Response to Light, Auxin and Brassinosteroid* , 2010 .

[25]  Graziano Pesole,et al.  WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences , 2007, BMC Bioinformatics.

[26]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[27]  Kathleen Marchal,et al.  Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1 , 2003, Plant Physiology.

[28]  Michael B. Eisen,et al.  Phylogenetic Motif Detection by Expectation-Maximization on Evolutionary Mixtures , 2003, Pacific Symposium on Biocomputing.

[29]  Martin C. Frith,et al.  Discovering Sequence Motifs with Arbitrary Insertions and Deletions , 2008, PLoS Comput. Biol..

[30]  Jian Ye,et al.  BLAST: improvements for better sequence analysis , 2006, Nucleic Acids Res..

[31]  Yukihisa Shimada,et al.  Comprehensive Comparison of Auxin-Regulated and Brassinosteroid-Regulated Genes in Arabidopsis[w] , 2004, Plant Physiology.