Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools

BackgroundThe phenomenon of functional site turnover has important implications for the study of regulatory region evolution, such as for promoter sequence alignments and transcription factor binding site (TFBS) identification. At present, it remains difficult to estimate TFBS turnover rates on real genomic sequences, as reliable mappings of functional sites across related species are often not available. As an alternative, we introduce a flexible new simulation system, Phylogenetic Simulation of Promoter Evolution (PSPE), designed to study functional site turnovers in regulatory sequences.ResultsUsing PSPE, we study replacement turnover rates of different individual TFBSs and simple modules of two sites under neutral evolutionary functional constraints. We find that TFBS replacement turnover can happen rapidly in promoters, and turnover rates vary significantly among different TFBSs and modules. We assess the influence of different constraints such as insertion/deletion rate and translocation distances. Complementing the simulations, we give simple but effective mathematical models for TFBS turnover rate prediction. As one important application of PSPE, we also present a first systematic evaluation of multiple sequence aligners regarding their capability of detecting TFBSs in promoters with site turnovers.ConclusionPSPE allows researchers for the first time to investigate TFBS replacement turnovers in promoters systematically. The assessment of alignment tools points out the limitations of current approaches to identify TFBSs in non-coding sequences, where turnover events of functional sites may happen frequently, and where we are interested in assessing the similarity on the functional level. PSPE is freely available at the authors' website.

[1]  Jun Kawai,et al.  Evolutionary turnover of mammalian transcription start sites. , 2006, Genome research.

[2]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[3]  M. Kreitman,et al.  Functional Evolution of a cis-Regulatory Module , 2005, PLoS biology.

[4]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[5]  J. Stone,et al.  Rapid evolution of cis-regulatory sequences via local point mutations. , 2001, Molecular biology and evolution.

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  J. Nevins,et al.  Interaction of YY1 with E2Fs, mediated by RYBP, provides a mechanism for specificity of E2F function , 2002, The EMBO journal.

[8]  T. Volkert,et al.  E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. , 2002, Genes & development.

[9]  F. Christians,et al.  E2Fs regulate the expression of genes involved in differentiation, development, proliferation, and apoptosis. , 2001, Genes & development.

[10]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[11]  A. Carter,et al.  Evolution of functionally conserved enhancers can be accelerated in large populations: a population–genetic model , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[12]  Kiyoshi Ohtani,et al.  Regulation of cell growth-dependent expression of mammalian CDC6 gene by the cell cycle transcription factor E2F , 1998, Oncogene.

[13]  N. Patel,et al.  Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. , 1998, Development.

[14]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[15]  Alessandro Guffanti,et al.  NemaFootPrinter: a web based software for the identification of conserved non-coding genome sequence regions between C. elegans and C. briggsae , 2005, BMC Bioinformatics.

[16]  Erik L L Sonnhammer,et al.  Quality assessment of multiple alignment programs , 2002, FEBS letters.

[17]  Michael Karin,et al.  IKK/NF-κB signaling: balancing life and death – a new approach to cancer therapy , 2005 .

[18]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[19]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[20]  M. Lässig,et al.  Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Michael Karin,et al.  IKK/NF-kappaB signaling: balancing life and death--a new approach to cancer therapy. , 2005, The Journal of clinical investigation.

[22]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[23]  Gary Ruvkun,et al.  Functional tests of enhancer conservation between distantly related species , 2003, Development.

[24]  Uwe Ohler,et al.  Optimized mixed Markov models for motif identification , 2006, BMC Bioinformatics.

[25]  M. Ludwig,et al.  Functional evolution of noncoding DNA. , 2002, Current opinion in genetics & development.

[26]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[27]  M. A. McClure,et al.  Comparative analysis of multiple protein-sequence alignment methods. , 1994, Molecular biology and evolution.

[28]  M. Steel,et al.  General time-reversible distances with unequal rates across sites: mixing gamma and inverse Gaussian distributions with invariant sites. , 1997, Molecular phylogenetics and evolution.

[29]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[30]  Esther G. L. Koh,et al.  Highly conserved syntenic blocks at the vertebrate Hox loci and conserved regulatory elements within and outside Hox gene clusters. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Abbreviations , 1971 .

[32]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[34]  W. Zong,et al.  NF-kappaB: Key mediator of inflammation-associated cancer , 2004, Cancer biology & therapy.

[35]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[36]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[37]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[38]  Reed A. Cartwright,et al.  DNA assembly with gaps (Dawg): simulating sequence evolution , 2005, Bioinform..

[39]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[40]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[41]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[42]  Lior Pachter,et al.  MAVID multiple alignment server , 2003, Nucleic Acids Res..

[43]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[44]  J. Nevins,et al.  Cdc6 is regulated by E2F and is essential for DNA replication in mammalian cells. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[45]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Aleksey Y Ogurtsov,et al.  Indel-based evolutionary distance and mouse-human divergence. , 2004, Genome research.

[47]  David N Arnosti,et al.  Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? , 2005, Journal of cellular biochemistry.

[48]  Mark Bieda,et al.  Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. , 2006, Genome research.

[49]  Leping Li,et al.  Accurate anchoring alignment of divergent sequences , 2006, Bioinform..

[50]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[51]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[52]  Kristian Helin,et al.  Cell Cycle-Regulated Expression of MammalianCDC6 Is Dependent on E2F , 1998, Molecular and Cellular Biology.

[53]  Wyeth W. Wasserman,et al.  A new generation of JASPAR, the open-access repository for transcription factor binding site profiles , 2005, Nucleic Acids Res..

[54]  S. Jeffery Evolution of Protein Molecules , 1979 .

[55]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[56]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[57]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[58]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[59]  Axel Meyer,et al.  Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. , 2003, Genome research.

[60]  Terence Hwa,et al.  On the Selection and Evolution of Regulatory DNA Motifs , 2001, Journal of Molecular Evolution.

[61]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[62]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[63]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[64]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[65]  Michael B. Eisen,et al.  Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments , 2006, BMC Bioinformatics.

[66]  Eric D Green,et al.  Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. , 2006, Trends in genetics : TIG.

[67]  M. Eilers,et al.  Control of cell proliferation and growth by Myc proteins. , 2006, Results and problems in cell differentiation.

[68]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[69]  L. Pachter,et al.  rVista for comparative sequence-based discovery of functional transcription factor binding sites. , 2002, Genome research.

[70]  J. Nevins,et al.  Distinct roles for E2F proteins in cell growth control and apoptosis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Enrique Blanco,et al.  Transcription Factor Map Alignment of Promoter Regions , 2006, PLoS Comput. Biol..

[72]  R. Jackson Genomic regulatory systems , 2001 .

[73]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[74]  Morris F. Maduro,et al.  Conservation of function and expression of unc-119 from two Caenorhabditis species despite divergence of non-coding DNA. , 1996, Gene.

[75]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[76]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[77]  M. Dobrovolskaia,et al.  Inflammation and cancer: when NF-kappaB amalgamates the perilous partnership. , 2005, Current cancer drug targets.

[78]  David M. Livingston,et al.  A Complex with Chromatin Modifiers That Occupies E2F- and Myc-Responsive Genes in G0 Cells , 2002, Science.

[79]  E. Davidson Genomic Regulatory Systems: Development and Evolution , 2005 .

[80]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[81]  J. Brookfield,et al.  Expected rates and modes of evolution of enhancer sequences. , 2004, Molecular biology and evolution.

[82]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[83]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[84]  G. Evan,et al.  The c‐Myc protein induces cell cycle progression and apoptosis through dimerization with Max. , 1993, The EMBO journal.