Studying the Evolution of Promoter Sequences: A Waiting Time Problem

To gain a better understanding of the evolutionary dynamics of regulatory DNA sequences, we address the following questions: (1) How long does it take until a given transcription factor (TF) binding site emerges at random in a promoter sequence? and (2) How does the composition of a TF binding site affect this waiting time? Using two different probabilistic models (an i.i.d. model and a neighbor dependent model), we can compute the expected waiting time for every k-mer, k ranging from 5 to 10, until it appears in a promoter of a species. Our findings indicate that new TF binding sites can be created on a short evolutionary time scale, i.e. in a time span below the speciation time of human and chimp. Furthermore, one can conclude that the composition of a TF binding site plays a crucial role concerning the waiting time until it appears and that the CpG methylation-deamination substitution process probably accelerates the creation of new TF binding sites. A screening of existing TF binding sites moreover reveals that k-mers predicted to have short waiting times occur more frequently than others. Supplementary Material is available at www.libertonline.com/cmb .

[1]  M. Nowak Evolutionary Dynamics: Exploring the Equations of Life , 2006 .

[2]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[3]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[4]  Cyril Nicaud,et al.  An Automaton Approach for Waiting Times in DNA Evolution , 2011, J. Comput. Biol..

[5]  Laurent Duret,et al.  The Impact of Recombination on Nucleotide Substitutions in the Human Genome , 2008, PLoS genetics.

[6]  L. Mirny,et al.  Different gene regulation strategies revealed by analysis of binding motifs. , 2009, Trends in genetics : TIG.

[7]  Erich E. Wanker,et al.  UniHI: an entry gate to the human protein interactome , 2006, Nucleic Acids Res..

[8]  Deena R. Schmidt,et al.  Waiting for regulatory sequences to appear , 2007, math/0702883.

[9]  Jun Kawai,et al.  Heterotachy in Mammalian Promoter Evolution , 2006, PLoS genetics.

[10]  A. Hobolth,et al.  Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model , 2006, PLoS genetics.

[11]  M. Kreitman,et al.  Coding sequence evolution. , 1999, Current opinion in genetics & development.

[12]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[13]  D. Segal,et al.  Modularity of CHIP/LDB transcription complexes regulates cell differentiation , 2011, Fly.

[14]  Christopher B. Burge,et al.  DNA sequence evolution with neighbor-dependent mutation , 2001, RECOMB '02.

[15]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[16]  J. Brookfield,et al.  Expected rates and modes of evolution of enhancer sequences. , 2004, Molecular biology and evolution.

[17]  Gesine Reinert,et al.  Probabilistic and Statistical Properties of Words: An Overview , 2000, J. Comput. Biol..

[18]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[19]  J. Stone,et al.  Rapid evolution of cis-regulatory sequences via local point mutations. , 2001, Molecular biology and evolution.

[20]  D. Gifford,et al.  Tissue-specific transcriptional regulation has diverged significantly between human and mouse , 2007, Nature Genetics.