A User ' s Guide to the Encyclopedia of DNA Elements ( ENCODE )

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome. Citation: The ENCODE Project Consortium (2011) A User’s Guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol 9(4): e1001046. doi:10.1371/ journal.pbio.1001046 Academic Editor: Peter B. Becker, Adolf Butenandt Institute, Germany Received September 23, 2010; Accepted March 10, 2011; Published April 19, 2011 Copyright: 2011 The ENCODE Project Consortium. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Funded by the National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. The role of the NIH Project Management Group in the preparation of this paper was limited to coordination and scientific management of the ENCODE Consortium. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: 3C, Chromosome Conformation Capture; API, application programming interface; CAGE, Cap-Analysis of Gene Expression; ChIP, chromatin immunoprecipitation; DCC, Data Coordination Center; DHS, DNaseI hypersensitive site; ENCODE, Encyclopedia of DNA Elements; EPO, Enredo, Pecan, Ortheus approach; FDR, false discovery rate; GEO, Gene Expression Omnibus; GWAS, genome-wide association studies; IDR, Irreproducible Discovery Rate; Methyl-seq, sequencing-based methylation determination assay; NHGRI, National Human Genome Research Institute; PASRs, promoter-associated short RNAs; PET, Paired-End diTag; RACE, Rapid Amplification of cDNA Ends; RNA Pol2, RNA polymerase 2; RBP, RNA-binding protein; RRBS, Reduced Representation Bisulfite Sequencing; SRA, Sequence Read Archive; TAS, trait/disease-associated SNP; TF, transcription factor; TSS, transcription start site * E-mail: rmyers@hudsonalpha.org (RMM); jstam@u.washington.edu (JS); mpsnyder@stanford.edu (MS); dunham@ebi.ac.uk (ID); rch8@psu.edu (RCH); bernstein. bradley@mgh.harvard.edu (BEB); gingeras@cshl.edu (TRG); kent@soe.ucsc.edu (WJK); birney@ebi.ac.uk (EB); woldb@caltech.edu (BW); greg.crawford@duke.edu (GEC) " Membership of the ENCODE Project Consortium is provided in the Acknowledgments.

[1]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[2]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[3]  T. Jensen,et al.  Nuclear quality control of RNA polymerase II transcripts , 2010, Wiley interdisciplinary reviews. RNA.

[4]  M. Snyder,et al.  ChIP‐Seq: A Method for Global Identification of Regulatory Elements in the Genome , 2010, Current protocols in molecular biology.

[5]  Z. Weng,et al.  Sequence features that drive human promoter function and tissue specificity. , 2010, Genome research.

[6]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[7]  E. Eichler,et al.  Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions , 2010, Nature Methods.

[8]  Benjamin P. Blackburne,et al.  Mutation spectrum revealed by breakpoint sequencing of human germline CNVs , 2010, Nature Genetics.

[9]  Paul Mitchell,et al.  Common Genetic Variants near the Brittle Cornea Syndrome Locus ZNF469 Influence the Blinding Disease Risk Factor Central Corneal Thickness , 2010, PLoS genetics.

[10]  E. Birney,et al.  Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans , 2010, Science.

[11]  Wolfgang Wagner,et al.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. , 2010, Genome research.

[12]  Owen T McCann,et al.  Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. , 2010, Genome research.

[13]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[14]  Chia-Lin Wei,et al.  Dynamic changes in the human methylome during differentiation. , 2010, Genome research.

[15]  M. Gerstein,et al.  Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing , 2010, Proceedings of the National Academy of Sciences.

[16]  Karen L. Mohlke,et al.  A map of open chromatin in human pancreatic islets , 2010, Nature Genetics.

[17]  Michael D. Cole,et al.  Upregulation of c-MYC in cis through a Large Chromatin Loop Linked to a Cancer Risk-Associated Single-Nucleotide Polymorphism in Colorectal Cancer Cells , 2010, Molecular and Cellular Biology.

[18]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[19]  Gautier Koscielny,et al.  Ensembl’s 10th year , 2009, Nucleic Acids Res..

[20]  R. Sandstrom,et al.  CCCTC-binding factor and the transcription factor T-bet orchestrate T helper 1 cell-specific structure and function at the interferon-gamma locus. , 2009, Immunity.

[21]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[22]  Job Dekker,et al.  My5C: web tools for chromosome conformation capture studies , 2009, Nature Methods.

[23]  Ali Amin Al Olama,et al.  Multiple loci on 8q24 associated with prostate cancer susceptibility , 2009, Nature Genetics.

[24]  Raymond K. Auerbach,et al.  Mapping accessible chromatin regions using Sono-Seq , 2009, Proceedings of the National Academy of Sciences.

[25]  Esko Ukkonen,et al.  The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling , 2009, Nature Genetics.

[26]  D. Reich,et al.  Functional Enhancers at the Gene-Poor 8q24 Cancer-Linked Locus , 2009, PLoS genetics.

[27]  P. Giresi,et al.  Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). , 2009, Methods.

[28]  K. White,et al.  Genomic Antagonism between Retinoic Acid and Estrogen Signaling in Breast Cancer , 2009, Cell.

[29]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[30]  Timothy E. Reddy,et al.  Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. , 2009, Genome research.

[31]  Stephen C. J. Parker,et al.  Local DNA Topography Correlates with Functional Noncoding Regions of the Human Genome , 2009, Science.

[32]  E. Liu,et al.  Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. , 2009, Genome research.

[33]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[34]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[35]  N. Camp,et al.  Meta Association of Colorectal Cancer Confirms Risk Alleles at 8q24 and 18q21 , 2009, Cancer Epidemiology Biomarkers & Prevention.

[36]  A. Jakubowska,et al.  A range of cancers is associated with the rs6983267 marker on chromosome 8. , 2008, Cancer research.

[37]  E. Birney,et al.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. , 2008, Genome research.

[38]  E. Birney,et al.  Genome-wide nucleotide-level mammalian ancestor reconstruction. , 2008, Genome research.

[39]  Job Dekker,et al.  Long-range chromosomal interactions and gene regulation. , 2008, Molecular bioSystems.

[40]  Michael J MacCoss,et al.  Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations. , 2008, Genome research.

[41]  Christopher M. Vockley,et al.  Detection and characterization of silencers and enhancer-blockers in the greater CFTR locus. , 2008, Genome research.

[42]  T. Mikkelsen,et al.  Genome-scale DNA methylation maps of pluripotent and differentiated cells , 2008, Nature.

[43]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[44]  Karl Mechtler,et al.  BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals , 2008, Nature Methods.

[45]  Morgan C. Giddings,et al.  Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification , 2008, Bioinform..

[46]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[47]  Mikhail A. Roytberg,et al.  Analysis of Sequence Conservation at Nucleotide Resolution , 2007, PLoS Comput. Biol..

[48]  R. Myers,et al.  The ets-Related Transcription Factor GABP Directs Bidirectional Transcription , 2007, PLoS genetics.

[49]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[50]  William Stafford Noble,et al.  Widely distributed noncoding purifying selection in the human genome , 2007, Proceedings of the National Academy of Sciences.

[51]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[52]  Alexander Eckehart Urban,et al.  in the human genome Systematic prediction and validation of breakpoints associated with copy-number variants , 2007 .

[53]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[54]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[55]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[56]  T. Gingeras,et al.  Genome-wide transcription and the implications for genomic organization , 2007, Nature Reviews Genetics.

[57]  Zhiping Weng,et al.  Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. , 2007, Genome research.

[58]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[59]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[60]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[61]  E. Lander,et al.  The Mammalian Epigenome , 2007, Cell.

[62]  T. Kouzarides Chromatin Modifications and Their Function , 2007, Cell.

[63]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[64]  C. Nusbaum,et al.  Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. , 2006, Genome research.

[65]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[66]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[67]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[68]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[69]  Mark Gerstein,et al.  PseudoPipe: an automated pseudogene identification pipeline , 2006, Bioinform..

[70]  S. Nelson,et al.  The problem of neuronal cell types: a physiological genomics approach , 2006, Trends in Neurosciences.

[71]  J. Zeitlinger,et al.  Polycomb complexes repress developmental regulators in murine embryonic stem cells , 2006, Nature.

[72]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[73]  Megan F. Cole,et al.  Control of Developmental Regulators by Polycomb in Human Embryonic Stem Cells , 2006, Cell.

[74]  James A. Cuff,et al.  A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells , 2006, Cell.

[75]  Arend Sidow,et al.  Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. , 2005, Annual review of genomics and human genetics.

[76]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[77]  Eric S. Lander,et al.  Genomic Maps and Comparative Analysis of Histone Modifications in Human and Mouse , 2005, Cell.

[78]  J. Stamatoyannopoulos,et al.  Discovery of functional noncoding elements by digital analysis of chromatin structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[80]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[81]  Peter A. Jones,et al.  Distinct localization of histone H3 acetylation and H3-K4 methylation to the transcription start sites in the human genome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[82]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[83]  F. Collins,et al.  A vision for the future of genomics research , 2003, Nature.

[84]  Barry Moore,et al.  Genome-based peptide fingerprint scanning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[85]  A. Hüttenhofer,et al.  The expanding snoRNA world. , 2002, Biochimie.

[86]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[87]  S. Tenenbaum,et al.  Ribonomics: identifying mRNA subsets in mRNP complexes using antibodies to RNA-binding proteins and genomic arrays. , 2002, Methods.

[88]  S. Tenenbaum,et al.  Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[89]  C. Allis,et al.  The language of covalent histone modifications , 2000, Nature.

[90]  J. Thomson,et al.  Embryonic stem cell lines derived from human blastocysts. , 1998, Science.

[91]  R. Jaenisch DNA methylation and imprinting: why bother? , 1997, Trends in genetics : TIG.

[92]  V. Rotter,et al.  A novel transcript encoded within the 10-kb first intron of the human p53 tumor suppressor gene (D17S2179E) is induced during differentiation of myeloid leukemia cells. , 1996, Genomics.

[93]  S. Orkin,et al.  In vivo protein-DNA interactions at hypersensitive site 3 of the human beta-globin locus control region. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[94]  M. Frohman,et al.  Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[95]  J. D. Engel,et al.  A 200 base pair region at the 5′ end of the chicken adult β-globin gene is accessible to nuclease digestion , 1981, Cell.

[96]  Carl Wu The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I , 1980, Nature.

[97]  C C Howe,et al.  Human hepatocellular carcinoma cell lines secrete the major plasma proteins and hepatitis B surface antigen. , 1980, Science.

[98]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[99]  C. Lozzio,et al.  Human chronic myelogenous leukemia cell-line with positive Philadelphia chromosome. , 1975, Blood.

[100]  E. Jaffe,et al.  Culture of human endothelial cells derived from umbilical veins. Identification by morphologic and immunologic criteria. , 1973, The Journal of clinical investigation.

[101]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[102]  Kayla E. Smith,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[103]  Jun Kawai,et al.  Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. , 2009, Genome research.

[104]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[105]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[106]  klaguia Prepublication Data Sharing , 2009 .

[107]  S. Tenenbaum,et al.  Advances in RIP-chip analysis : RNA-binding protein immunoprecipitation-microarray profiling. , 2008, Methods in molecular biology.

[108]  B. Wold,et al.  Sequence census methods for functional genomics , 2008, Nature Methods.

[109]  Jordan M. Komisarow,et al.  RIP-Chip: the isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extracts , 2006, Nature Protocols.

[110]  N. Nomura,et al.  Complete sequencing and characterization of 21,243 full-length human cDNAs , 2004, Nature Genetics.

[111]  Tony Kouzarides,et al.  Histone H3 lysine 4 methylation patterns in higher eukaryotic genes , 2004, Nature Cell Biology.

[112]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[113]  D Haussler,et al.  The share of human genomic DNA under selection estimated from human-mouse genomic alignments. , 2003, Cold Spring Harbor symposia on quantitative biology.

[114]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[115]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[116]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[117]  S. Elgin,et al.  DNase I hypersensitive sites in Drosophila chromatin occur at the 5' ends of regions of transcription. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Chin H. Li,et al.  The Ucsc Genome Browser Database: Update 2011 , 2022 .