Discovery and characterization of chromatin states for systematic annotation of the human genome

A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal 'chromatin states' in human T cells, based on recurrent and spatially coherent combinations of chromatin marks. We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, large-scale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

[1]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[2]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[3]  C. Allis,et al.  The language of covalent histone modifications , 2000, Nature.

[4]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[5]  S. Schreiber,et al.  Signaling Network Model of Chromatin , 2002, Cell.

[6]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[7]  David Haussler,et al.  Integration of the cytogenetic map with the draft human genome sequence. , 2003, Human molecular genetics.

[8]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[9]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[10]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David L. Wheeler,et al.  GenBank: update , 2004, Nucleic Acids Res..

[12]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[13]  Andrew W. Moore,et al.  Making logistic regression a core data mining tool with TR-IRLS , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[15]  Zhiping Weng,et al.  Global mapping of c-Myc binding sites and target gene networks in human B cells , 2006, Proceedings of the National Academy of Sciences.

[16]  Kevin Struhl,et al.  Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. , 2006, Molecular cell.

[17]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[18]  D. C. Schultz,et al.  The KAP1 Corepressor Functions To Coordinate the Assembly of De Novo HP1-Demarcated Microenvironments of Heterochromatin Required for KRAB Zinc Finger Protein-Mediated Transcriptional Repression , 2006, Molecular and Cellular Biology.

[19]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[20]  Daniel J. Blankenberg,et al.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser. , 2007, Genome research.

[21]  William Stafford Noble,et al.  Identification of higher-order functional domains in the human ENCODE regions. , 2007, Genome research.

[22]  T. Kouzarides Chromatin Modifications and Their Function , 2007, Cell.

[23]  E. Lander,et al.  The Mammalian Epigenome , 2007, Cell.

[24]  William Stafford Noble,et al.  Unsupervised segmentation of continuous genomic data , 2007, Bioinform..

[25]  Henriette O'Geen,et al.  Genome-Wide Analysis of KAP1 Binding Suggests Autoregulation of KRAB-ZNFs , 2007, PLoS genetics.

[26]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[27]  Manolis Kellis,et al.  Reliable prediction of regulator targets using 12 Drosophila genomes. , 2007, Genome research.

[28]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[29]  Joshy George,et al.  Genome-wide mapping of RELA(p65) binding identifies E2F1 as a transcriptional activator recruited by NF-kappaB upon TLR4 activation. , 2007, Molecular cell.

[30]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[31]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[32]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[33]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[34]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[35]  Jan Komorowski,et al.  Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. , 2008, Genome research.

[36]  Bing Ren,et al.  ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome , 2008, PLoS Comput. Biol..

[37]  L. Wessels,et al.  Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions , 2008, Nature.

[38]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[39]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[40]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[41]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[42]  Bing Ren,et al.  Prediction of regulatory elements in mammalian genomes using chromatin signatures , 2008, BMC Bioinformatics.

[43]  Clifford A. Meyer,et al.  FoxA1 Translates Epigenetic Signatures into Enhancer-Driven Lineage-Specific Transcription , 2008, Cell.

[44]  David Haussler,et al.  The UCSC Genome Browser Database: 2008 update , 2007, Nucleic Acids Res..

[45]  Mark Gerstein,et al.  Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. , 2008, Genome research.

[46]  Jun S. Song,et al.  Identifying Positioned Nucleosomes with Epigenetic Marks in Human from ChIP-Seq , 2008, BMC Genomics.

[47]  Amos Tanay,et al.  Spatial Clustering of Multivariate Genomic and Epigenomic Information , 2009, RECOMB.

[48]  G. Ast,et al.  Chromatin organization marks exon-intron structure , 2009, Nature Structural &Molecular Biology.

[49]  Dustin E. Schones,et al.  Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. , 2009, Cell stem cell.

[50]  Amos Tanay,et al.  Functional Anatomy of Polycomb and Trithorax Chromatin Landscapes in Drosophila Embryos , 2009, PLoS biology.

[51]  Jan Komorowski,et al.  Nucleosomes are well positioned in exons and carry characteristic histone modifications. , 2009, Genome research.

[52]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[53]  D. Reich,et al.  Functional Enhancers at the Gene-Poor 8q24 Cancer-Linked Locus , 2009, PLoS genetics.

[54]  D. Postma,et al.  Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction , 2009, Nature Genetics.

[55]  E. Wherry Faculty Opinions recommendation of Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. , 2009 .

[56]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[57]  J. Ahringer,et al.  Differential chromatin marking of introns and expressed exons by H3K36me3 , 2008, Nature Genetics.

[58]  Michael Q. Zhang,et al.  High-resolution human core-promoter prediction with CoreBoost_HM. , 2009, Genome research.

[59]  Christoforos Nikolaou,et al.  Nucleosome positioning as a determinant of exon recognition , 2009, Nature Structural &Molecular Biology.

[60]  Dustin E. Schones,et al.  Genome-wide Mapping of HATs and HDACs Reveals Distinct Functions in Active and Inactive Genes , 2009, Cell.

[61]  Bing Ren,et al.  Discovery and Annotation of Functional Chromatin Signatures in the Human Genome , 2009, PLoS Comput. Biol..

[62]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[63]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.