EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era

The Eukaryotic Promoter Database (EPD), available online at http://epd.vital-it.ch, is a collection of experimentally defined eukaryotic POL II promoters which has been maintained for more than 25 years. A promoter is represented by a single position in the genome, typically the major transcription start site (TSS). EPD primarily serves biologists interested in analysing the motif content, chromatin structure or DNA methylation status of co-regulated promoter subsets. Initially, promoter evidence came from TSS mapping experiments targeted at single genes and published in journal articles. Today, the TSS positions provided by EPD are inferred from next-generation sequencing data distributed in electronic form. Traditionally, EPD has been a high-quality database with low coverage. The focus of recent efforts has been to reach complete gene coverage for important model organisms. To this end, we introduced a new section called EPDnew, which is automatically assembled from multiple, carefully selected input datasets. As another novelty, we started to use chromatin signatures in addition to mRNA 5′tags to locate promoters of weekly expressed genes. Regarding user interfaces, we introduced a new promoter viewer which enables users to explore promoter-defining experimental evidence in a UCSC genome browser window.

[1]  David Haussler,et al.  The UCSC genome browser and associated tools , 2012, Briefings Bioinform..

[2]  A. Sandelin,et al.  Metazoan promoters: emerging characteristics and insights into transcriptional regulation , 2012, Nature Reviews Genetics.

[3]  Kenta Nakai,et al.  DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs , 2002, Nucleic Acids Res..

[4]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[5]  Giovanna Ambrosini,et al.  Signal search analysis server , 2003, Nucleic Acids Res..

[6]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[7]  Galt P. Barber,et al.  BigWig and BigBed: enabling browsing of large distributed datasets , 2010, Bioinform..

[8]  Philipp Bucher,et al.  MER41 Repeat Sequences Contain Inducible STAT1 Binding Sites , 2010, PloS one.

[9]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[10]  G. Rubin,et al.  Computational analysis of core promoters in the Drosophila genome , 2002, Genome Biology.

[11]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2011 , 2011, Nucleic Acids Res..

[12]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[13]  Denis Thieffry,et al.  RSAT 2011: regulatory sequence analysis tools , 2011, Nucleic Acids Res..

[14]  Hideaki Sugawara,et al.  DDBJ in collaboration with mass-sequencing teams on annotation , 2004, Nucleic Acids Res..

[15]  Kenta Nakai,et al.  DBTSS: DataBase of Transcriptional Start Sites progress report in 2012 , 2011, Nucleic Acids Res..

[16]  Philipp Bucher,et al.  ChIP-Seq Data Reveal Nucleosome Architecture of Human Promoters , 2007, Cell.

[17]  J. T. Kadonaga,et al.  Regulation of gene expression via the core promoter and the basal transcriptional machinery. , 2010, Developmental biology.

[18]  Philip Campbell,et al.  Presenting ENCODE , 2012, Nature.

[19]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[20]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Atsushi Sasaki,et al.  MachiBase: a Drosophila melanogaster 5′-end mRNA transcription database , 2008, Nucleic Acids Res..

[22]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD: the impact of in silico primer extension , 2004, Nucleic Acids Res..

[23]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[24]  Dustin E. Schones,et al.  Monovalent and unpoised status of most genes in undifferentiated cell-enriched Drosophila testis , 2010, Genome Biology.

[25]  Philipp Bucher,et al.  The Eukaryotic Promoter Database (EPD) , 2000, Nucleic Acids Res..

[26]  Philipp Bucher,et al.  The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data , 2002, Nucleic Acids Res..

[27]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[28]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[29]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[30]  C. Ku,et al.  Studying the epigenome using next generation sequencing , 2011, Journal of Medical Genetics.

[31]  Philipp Bucher,et al.  MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data , 2007, Nucleic Acids Res..

[32]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[33]  Erik van Nimwegen,et al.  SwissRegulon: a database of genome-wide annotations of regulatory sites , 2006, Nucleic Acids Res..

[34]  Uwe Ohler,et al.  Transcription Initiation Patterns Indicate Divergent Strategies for Gene Regulation at the Chromatin Level , 2011, PLoS genetics.

[35]  Piero Carninci,et al.  Genome-wide analysis of promoter architecture in Drosophila melanogaster. , 2011, Genome research.