MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes

BackgroundCis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.ResultsWe describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.ConclusionThe search engine, available at http://mapper.chip.org, allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

[1]  Mark Rebeiz,et al.  SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[3]  Luquan Wang,et al.  Global transcriptional program of p53 target genes during the process of apoptosis and cell cycle progression , 2003, Oncogene.

[4]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[5]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[6]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[7]  Alberto Riva,et al.  The MAPPER database: a multi-genome catalog of putative transcription factor binding sites , 2004, Nucleic Acids Res..

[8]  Wyeth W. Wasserman,et al.  MSCAN: identification of functional clusters of transcription factor binding sites , 2004, Nucleic Acids Res..

[9]  Tao Jiang,et al.  Identifying transcription factor binding sites through Markov chain optimization , 2002, ECCB.

[10]  Eldon Emberly,et al.  Conservation of regulatory elements between two species of Drosophila , 2003, BMC Bioinformatics.

[11]  Wolfgang H. Fischer,et al.  Regulation of the ERBB-2 Promoter by RBPJκ and NOTCH* , 1997, The Journal of Biological Chemistry.

[12]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[13]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[14]  Jeffrey M. Trimarchi,et al.  Transcription: Sibling rivalry in the E2F family , 2002, Nature Reviews Molecular Cell Biology.

[15]  M. Jensen,et al.  Chromosome localization and structure of the murine cyclin G1 gene promoter sequence. , 1997, Genomics.

[16]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[17]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[18]  Arjumand Ghazi,et al.  Developmental biology: Control by combinatorial codes , 2000, Nature.

[19]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[20]  P. Farnham,et al.  Target Gene Specificity of E2F and Pocket Protein Family Members in Living Cells , 2000, Molecular and Cellular Biology.

[21]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[22]  K. Kinzler,et al.  14-3-3sigma is a p53-regulated inhibitor of G2/M progression. , 1997, Molecular cell.

[23]  Martin Vingron,et al.  Exploring potential target genes of signaling pathways by predicting conserved transcription factor binding sites , 2003, ECCB.

[24]  R. Tjian,et al.  Transcription regulation and animal diversity , 2003, Nature.

[25]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[26]  G. Christian Overton,et al.  Application of hidden Markov modeling to the characterization of transcription factor binding sites , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[27]  Zhiping Weng,et al.  SeqVISTA: a new module of integrated computational tools for studying transcriptional regulation , 2004, Nucleic Acids Res..

[28]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[29]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[30]  Martin C. Frith,et al.  Detection of cis -element clusters in higher eukaryotic DNA , 2001, Bioinform..

[31]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[32]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[33]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[34]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[35]  B. Edgar,et al.  Genomic binding by the Drosophila Myc, Max, Mad/Mnt transcription factor network. , 2003, Genes & development.

[36]  A. Sandelin,et al.  Prediction of nuclear hormone receptor response elements. , 2005, Molecular endocrinology.

[37]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[38]  Masato Ishikawa,et al.  Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences , 1998, Bioinform..

[39]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[40]  Simon Whelan,et al.  Statistical Methods in Molecular Evolution , 2005 .

[41]  Martin Schindler,et al.  AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome , 2004, Nucleic Acids Res..

[42]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[43]  M. Tompa,et al.  Discovery of novel transcription factor binding sites by statistical overrepresentation. , 2002, Nucleic acids research.

[44]  Roded Sharan,et al.  CREME: Cis-Regulatory Module Explorer for the human genome , 2004, Nucleic Acids Res..

[45]  Ivan Ovcharenko,et al.  rVISTA 2.0: evolutionary analysis of transcription factor binding sites , 2004, Nucleic Acids Res..

[46]  Kiyoshi Ohtani,et al.  Cell growth-regulated expression of mammalian MCM5 and MCM6 genes mediated by the transcription factor E2F , 1999, Oncogene.

[47]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[48]  Marc S. Halfon,et al.  Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura , 2004, Bioinform..

[49]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[50]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[51]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[52]  Wing Hung Wong,et al.  Determination of Local Statistical Significance of Patterns in Markov Sequences with Application to Promoter Element Identification , 2004, J. Comput. Biol..

[53]  Andrew I Su,et al.  Genome-wide analysis of CREB target genes reveals a core promoter requirement for cAMP responsiveness. , 2003, Molecular cell.

[54]  Z. Weng,et al.  Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. , 2002, Nucleic acids research.

[55]  James W Carman,et al.  Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. , 2002, Genome research.

[56]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[57]  G. Stormo,et al.  Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. , 2001, Nucleic acids research.

[58]  Wyeth W. Wasserman,et al.  ConSite: web-based prediction of regulatory elements using cross-species comparison , 2004, Nucleic Acids Res..

[59]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[60]  E. Olson,et al.  The Mef2c gene is a direct transcriptional target of myogenic bHLH and MEF2 proteins during skeletal muscle development. , 2001, Development.

[61]  D. Lazarević,et al.  A novel p53‐inducible gene coding for a microtubule‐localized protein with G2‐phase‐specific expression , 1998, The EMBO journal.

[62]  W. Fischer,et al.  Regulation of the ERBB-2 promoter by RBPJkappa and NOTCH. , 1997, The Journal of biological chemistry.

[63]  K. Kinzler,et al.  14-3-3σ Is a p53-Regulated Inhibitor of G2/M Progression , 1997 .

[64]  Andrey N. Naumochkin,et al.  Transcription Regulatory Regions Database (TRRD): its status in 2002 , 2002, Nucleic Acids Res..

[65]  Mathieu Blanchette,et al.  Motif Discovery in Heterogeneous Sequence Data , 2003, Pacific Symposium on Biocomputing.

[66]  Kathleen K. Kelly,et al.  PII: S0925-4773(01)00586-X , 2001 .

[67]  E. Ramos,et al.  Notch Activation of yan Expression Is Antagonized by RTK/Pointed Signaling in the Drosophila Eye , 2002, Current Biology.

[68]  T. Han,et al.  CD28-mediated regulation of the c-jun promoter involves the MEF2 transcription factor in Jurkat T cells. , 1999, Molecular immunology.

[69]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[70]  Joseph B. Rayman,et al.  Analysis of promoter binding by the E2F and pRB families in vivo: distinct E2F proteins mediate activation and repression. , 2000, Genes & development.

[71]  Serafim Batzoglou,et al.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics. , 2004, Genome research.

[72]  William Stafford Noble,et al.  Searching for statistically significant regulatory modules , 2003, ECCB.

[73]  Alexander E. Kel,et al.  Transcription Regulatory Regions Database (TRRD): its status in 1999 , 1999, Nucleic Acids Res..

[74]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[75]  M. Blumenfeld,et al.  Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. , 1997, Journal of molecular biology.

[76]  Jörg Schultz,et al.  HMM Logos for visualization of protein families , 2004, BMC Bioinformatics.

[77]  John McAnally,et al.  The Mef 2 c gene is a direct transcriptional target of myogenic bHLH and MEF 2 proteins during skeletal muscle development , 2001 .

[78]  Andrea Cocito,et al.  Genomic targets of the human c-Myc protein. , 2003, Genes & development.

[79]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[80]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[81]  K Frech,et al.  Software for the analysis of DNA sequence elements of transcription , 1997, Comput. Appl. Biosci..

[82]  Sean R. Eddy,et al.  HMMER User's Guide - Biological sequence analysis using profile hidden Markov models , 1998 .

[83]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[84]  P. Qiu Recent advances in computational promoter analysis in understanding the transcriptional regulatory network. , 2003, Biochemical and biophysical research communications.

[85]  N. Dyson The regulation of E2F by pRB-family proteins. , 1998, Genes & development.

[86]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[87]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[88]  M. Bulyk Computational prediction of transcription-factor binding site locations , 2003, Genome Biology.

[89]  L. Pennacchio,et al.  Comparative genomic tools and databases: providing insights into the human genome. , 2003, The Journal of clinical investigation.

[90]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[91]  Alexander E. Kel,et al.  TRANSCompel®: a database on composite regulatory elements in eukaryotic genes , 2002, Nucleic Acids Res..

[92]  P Youngman,et al.  Genome‐wide analysis of the general stress response in Bacillus subtilis , 2001, Molecular microbiology.

[93]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[94]  M. Blanchette,et al.  Discovery of regulatory elements by a computational method for phylogenetic footprinting. , 2002, Genome research.

[95]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[96]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[97]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[98]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[99]  D Botstein,et al.  Regulation of DNA replication during the yeast cell cycle. , 1991, Cold Spring Harbor symposia on quantitative biology.

[100]  Oliver Hobert,et al.  CisOrtho: A program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting , 2004, BMC Bioinformatics.

[101]  Marc S. Halfon,et al.  Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D . melanogaster and D . pseudoobscura , 2004 .

[102]  Serafim Batzoglou,et al.  A suite of web-based programs to search for transcriptional regulatory motifs , 2004, Nucleic Acids Res..