Identification and analysis of functional elements in 1 % of the human genome by the ENCODE pilot project The ENCODE Project Consortium *

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on interand intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

[1]  J. Lieb,et al.  Cell Cycle–Specified Fluctuation of Nucleosome Occupancy at Gene Promoters , 2006, PLoS genetics.

[2]  C. Schildkraut,et al.  Replication program of active and inactive multigene families in mammalian cells , 1988, Molecular and cellular biology.

[3]  Dirk Schübeler,et al.  A question of timing: emerging links between transcription and replication. , 2006, Current opinion in genetics & development.

[4]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[5]  Jan Komorowski,et al.  Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. , 2005, Human molecular genetics.

[6]  D. Reinberg,et al.  Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27. , 2004, Genes & development.

[7]  Uwe Ohler,et al.  Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment , 2006, Genome Biology.

[8]  Thomas D. Tullius,et al.  Structural details of an adenine tract that does not cause DNA to bend , 1988, Nature.

[9]  Stephen C. J. Parker,et al.  Detection of DNA structural motifs in functional genomic elements. , 2007, Genome research.

[10]  David M MacAlpine,et al.  Coordination of replication and transcription along a Drosophila chromosome. , 2004, Genes & development.

[11]  F. Ayala,et al.  Pseudogenes: are they "junk" or functional DNA? , 2003, Annual review of genetics.

[12]  G. Felsenfeld,et al.  CTCF tethers an insulator to subnuclear sites, suggesting shared insulator mechanisms across species. , 2004, Molecular cell.

[13]  Lior Pachter,et al.  Subtree power analysis and species selection for comparative genomics , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[15]  R. Myers,et al.  Identification and functional analysis of human transcriptional promoters. , 2003, Genome research.

[16]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[17]  Hagen Blankenburg,et al.  The implications of alternative splicing in the ENCODE protein complement , 2007, Proceedings of the National Academy of Sciences.

[18]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[19]  William Stafford Noble,et al.  Identification of higher-order functional domains in the human ENCODE regions. , 2007, Genome research.

[20]  M. Gerstein,et al.  A computational approach for identifying pseudogenes in the ENCODE regions , 2006, Genome Biology.

[21]  T. Wolfsberg,et al.  DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays , 2006, Nature Methods.

[22]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[23]  Laurent Excoffier,et al.  Conserved noncoding sequences are selectively constrained and not mutation cold spots , 2006, Nature Genetics.

[24]  Zhiping Weng,et al.  Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. , 2007, Genome research.

[25]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[26]  J. Stamatoyannopoulos,et al.  High-throughput localization of functional elements by quantitative chromatin profiling , 2004, Nature Methods.

[27]  Charles Kooperberg,et al.  Genome-wide DNA replication profile for Drosophila melanogaster: a link between transcription and replication timing , 2002, Nature Genetics.

[28]  Zhiping Weng,et al.  Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. , 2007, Genome research.

[29]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[30]  M. Kreitman,et al.  Functional Evolution of a cis-Regulatory Module , 2005, PLoS biology.

[31]  T. Gingeras,et al.  TUF Love for “Junk” DNA , 2006, Cell.

[32]  Brigitte Wild,et al.  Histone Methyltransferase Activity of a Drosophila Polycomb Group Repressor Complex , 2002, Cell.

[33]  C. Ponting,et al.  Finishing the euchromatic sequence of the human genome , 2004 .

[34]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[35]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[36]  R. Hudson,et al.  A test of neutral molecular evolution based on nucleotide data. , 1987, Genetics.

[37]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[38]  Thomas E. Royce,et al.  Global Identification of Human Transcribed Sequences with Genome Tiling Arrays , 2004, Science.

[39]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[40]  Mark Gerstein,et al.  DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Zhiping Weng,et al.  PromoSer: improvements to the algorithm, visualization and accessibility , 2004, Nucleic Acids Res..

[42]  M. Gerstein,et al.  Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability , 2005, Nucleic acids research.

[43]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[44]  R. Myers,et al.  Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. , 2005, Genome research.

[45]  E. Eichler,et al.  A genome-wide comparison of recent chimpanzee and human segmental duplications , 2005, Nature.

[46]  Frank Grosveld,et al.  Spatial organization of gene expression: the active chromatin hub , 2003, Chromosome Research.

[47]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[48]  M. Kreitman,et al.  Evolutionary dynamics of the enhancer region of even-skipped in Drosophila. , 1995, Molecular biology and evolution.

[49]  Leah Barrera,et al.  A high-resolution map of active promoters in the human genome , 2005, Nature.

[50]  A. Mighell,et al.  Vertebrate pseudogenes , 2000, FEBS letters.

[51]  Jonghwan Kim,et al.  Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment , 2005, Nature Methods.

[52]  Eric D Green,et al.  Parallel construction of orthologous sequence-ready clone contig maps in multiple species. , 2002, Genome research.

[53]  Klaudia Walter,et al.  Open access, freely available online PLoS BIOLOGY Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2022 .

[54]  Bing Ren,et al.  Direct isolation and identification of promoters in the human genome. , 2005, Genome research.

[55]  E. Liu,et al.  Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation , 2005, Nature Methods.

[56]  R. Hansen,et al.  The timing of XIST replication: dominance of the domain. , 1999, Human molecular genetics.

[57]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[58]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[59]  A. Fisher,et al.  Heritable gene silencing in lymphocytes delays chromatid resolution without affecting the timing of DNA replication , 2003, Nature Cell Biology.

[60]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[61]  B. Edgar,et al.  Genomic binding by the Drosophila Myc, Max, Mad/Mnt transcription factor network. , 2003, Genes & development.

[62]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[63]  A. Wagschal,et al.  Epigenetic deregulation of imprinting in congenital diseases of aberrant growth. , 2006, BioEssays : news and reviews in molecular, cellular and developmental biology.

[64]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[65]  D. Gilbert,et al.  Replication timing and transcriptional control: beyond cause and effect. , 2009, Current opinion in cell biology.

[66]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[67]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[68]  D. Barlow,et al.  The imprinted Air ncRNA is an atypical RNAPII transcript that evades splicing and escapes nuclear export , 2006, The EMBO journal.

[69]  Andrea Cocito,et al.  Genomic targets of the human c-Myc protein. , 2003, Genes & development.

[70]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[71]  Mikhail A. Roytberg,et al.  Analysis of Sequence Conservation at Nucleotide Resolution , 2007, PLoS Comput. Biol..

[72]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[73]  S. Henikoff,et al.  Genome-scale profiling of histone H3.3 replacement patterns , 2005, Nature Genetics.

[74]  T. Furey,et al.  Genome-wide sequence and functional analysis of early replicating DNA in normal human fibroblasts , 2006, BMC Genomics.

[75]  A. Reymond,et al.  Tandem chimerism as a means to increase protein complexity in the human genome. , 2005, Genome research.

[76]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[77]  Jonghwan Kim,et al.  Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). , 2007, Genome research.

[78]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.

[79]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[80]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[81]  A. Hüttenhofer,et al.  The IC-SNURF-SNRPN transcript serves as a host for multiple small nucleolar RNA species and as an antisense RNA for UBE3A. , 2001, Human molecular genetics.

[82]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[83]  Charlotte N. Henrichsen,et al.  Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. , 2007, Genome research.

[84]  Mark Bieda,et al.  Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. , 2006, Genome research.

[85]  M. Gerstein,et al.  GATA-1 binding sites mapped in the β-globin locus by using mammalian chIp-chip analysis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[86]  T. Tullius,et al.  How the structure of an adenine tract depends on sequence context: a new model for the structure of TnAn DNA sequences. , 1993, Biochemistry.

[87]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[88]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[89]  Deyou Zheng,et al.  Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome. , 2007, Genome research.

[90]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[91]  S. Batzoglou,et al.  Characterization of evolutionary rates and constraints in three Mammalian genomes. , 2004, Genome research.

[92]  Hengbin Wang,et al.  Role of Histone H3 Lysine 27 Methylation in Polycomb-Group Silencing , 2002, Science.

[93]  Erez Y. Levanon,et al.  Widespread occurrence of antisense transcription in the human genome , 2003, Nature Biotechnology.

[94]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[95]  Niall Dillon,et al.  Gene regulation and large-scale chromatin organization in the nucleus , 2006, Chromosome Research.

[96]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[97]  Daniel E. Newburger,et al.  The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci. , 2007, Genome research.

[98]  P. Andolfatto Adaptive evolution of non-coding DNA in Drosophila , 2005, Nature.

[99]  Lior Pachter,et al.  MAVID: constrained ancestral alignment of multiple sequences. , 2003, Genome research.

[100]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[101]  Megan F. Cole,et al.  Control of Developmental Regulators by Polycomb in Human Embryonic Stem Cells , 2006, Cell.

[102]  Kristian Helin,et al.  Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. , 2006, Genes & development.

[103]  Srinka Ghosh,et al.  Temporal profile of replication of human chromosomes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[104]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[105]  Arend Sidow,et al.  Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. , 2005, Annual review of genomics and human genetics.

[106]  Michael Q. Zhang,et al.  A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[107]  Owen T McCann,et al.  Replication timing of the human genome. , 2004, Human molecular genetics.

[108]  G. Helt,et al.  Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution , 2005, Science.

[109]  Mark Gerstein,et al.  Integrated pseudogene annotation for human chromosome 22: evidence for transcription. , 2005, Journal of molecular biology.

[110]  S. Batalov,et al.  Antisense Transcription in the Mammalian Transcriptome , 2005, Science.

[111]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[112]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[113]  Shane C. Dillon,et al.  The landscape of histone modifications across 1% of the human genome in five human cell lines. , 2007, Genome research.

[114]  Bradley I. Coleman,et al.  An intermediate grade of finished genomic sequence suitable for comparative analyses. , 2004, Genome research.

[115]  A. Reymond,et al.  Conserved non-genic sequences — an unexpected feature of mammalian genomes , 2005, Nature Reviews Genetics.

[116]  B. Turner,et al.  Reading signals on the nucleosome with a new nomenclature for modified histones , 2005, Nature Structural &Molecular Biology.

[117]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[118]  Philipp Kapranov,et al.  Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. , 2007, Genome research.

[119]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[120]  M. Gerstein,et al.  Structured Rnas in the Encode Selected Regions of the Human Genome , 2022 .

[121]  Philipp Kapranov,et al.  Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. , 2005, Genome research.

[122]  Nick Gilbert,et al.  Chromatin Architecture of the Human Genome Gene-Rich Domains Are Enriched in Open Chromatin Fibers , 2004, Cell.

[123]  Jean L. Chang,et al.  An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[124]  Yijun Ruan,et al.  Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. , 2007, Genome research.