MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model

We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding-site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.

[1]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[2]  S. Jeffery Evolution of Protein Molecules , 1979 .

[3]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[4]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[5]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[6]  Rodger Staden,et al.  Methods for calculating the probabilities of finding patterns in sequences , 1989, Comput. Appl. Biosci..

[7]  Michael Carey,et al.  DNA recognition by GAL4: structure of a protein-DNA complex , 1992, Nature.

[8]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[9]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Lohr,et al.  Transcriptional regulation in the yeast GAL gene family: a complex genetic network , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[11]  M. Kreitman,et al.  Evolutionary dynamics of the enhancer region of even-skipped in Drosophila. , 1995, Molecular biology and evolution.

[12]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[13]  A. Andrianopoulos,et al.  Evolution of a fungal regulatory gene family: the Zn(II)2Cys6 binuclear cluster DNA binding motif. , 1997, Fungal genetics and biology : FG & B.

[14]  F. Ruddle,et al.  Modification of expression and cis-regulation of Hoxc8 in the evolution of diverged axial morphology. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  A. Halpern,et al.  Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. , 1998, Molecular biology and evolution.

[16]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[17]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[18]  H. Feldmann,et al.  Rpn4p acts as a transcription factor by binding to PACE, a nonamer box found upstream of 26S proteasomal and other genes in yeast , 1999, FEBS letters.

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[21]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[22]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[23]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[24]  N. Patel,et al.  Evidence for stabilizing selection in a eukaryotic enhancer element , 2000, Nature.

[25]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[26]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[27]  I-Min A. Dubchak,et al.  A computational approach to identify genes for functional RNAs in genomic sequences. , 2001, Nucleic acids research.

[28]  S. Eddy,et al.  Computational identification of noncoding RNAs in E. coli by comparative genomics , 2001, Current Biology.

[29]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[30]  David Botstein,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association , 2001, Nature Genetics.

[31]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[32]  G. Storz,et al.  Identification of novel small RNAs using comparative genomics and microarrays. , 2001, Genes & development.

[33]  Burkhard Morgenstern,et al.  Exon discovery by genomic sequence alignment , 2002, Bioinform..

[34]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[35]  L. Pachter,et al.  rVista for comparative sequence-based discovery of functional transcription factor binding sites. , 2002, Genome research.

[36]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[37]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[38]  A. Patzak,et al.  An evolutionary approach for identifying potential transcription factor binding sites: the renin gene as an example. , 2003, American journal of physiology. Regulatory, integrative and comparative physiology.

[39]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[40]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[41]  Alan M. Moses,et al.  Position specific variation in the rate of evolution in transcription factor binding sites , 2003, BMC Evolutionary Biology.

[42]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[43]  Alan M. Moses,et al.  Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts , 2003, RECOMB '03.

[44]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[45]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[46]  Antonis Rokas,et al.  Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Ivan Ovcharenko,et al.  rVISTA 2.0: evolutionary analysis of transcription factor binding sites , 2004, Nucleic Acids Res..

[48]  Eugene Berezikov,et al.  CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. , 2003, Genome research.

[49]  Oliver Hobert,et al.  CisOrtho: A program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting , 2004, BMC Bioinformatics.

[50]  Wyeth W. Wasserman,et al.  ConSite: web-based prediction of regulatory elements using cross-species comparison , 2004, Nucleic Acids Res..

[51]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[52]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.