Large-scale epigenome imputation improves data quality and disease variant enrichment

With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

[1]  Michael Q. Zhang,et al.  Histone methylation marks play important roles in predicting the methylation status of CpG islands. , 2008, Biochemical and biophysical research communications.

[2]  J. Stamatoyannopoulos,et al.  Chromatin accessibility pre-determines glucocorticoid receptor binding patterns , 2011, Nature Genetics.

[3]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[4]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[5]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[6]  Julia Lasserre,et al.  Finding Associations among Histone Modifications Using Sparse Partial Correlation Networks , 2013, PLoS Comput. Biol..

[7]  Manolis Kellis,et al.  Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome , 2011, RECOMB.

[8]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[9]  Jeffrey B. Cheng,et al.  Estimating absolute methylation levels at single-CpG resolution from methylation enrichment and restriction enzyme sequencing methods , 2013, RECOMB.

[10]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  J. Han,et al.  Inferring causal relationships among different histone modifications and gene expression. , 2008, Genome research.

[13]  Leighton J. Core,et al.  Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription , 2013, Science.

[14]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[15]  Jonathan K. Pritchard,et al.  Identification of Genetic Variants That Affect Histone Modifications in Human Cells , 2013, Science.

[16]  Michael Grunstein,et al.  Adenovirus Small e1a Alters Global Patterns of Histone Modification , 2008, Science.

[17]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[18]  Dan Xie,et al.  Extensive Variation in Chromatin States Across Humans , 2013, Science.

[19]  Jian Zhou,et al.  Global Quantitative Modeling of Chromatin Factor Interactions , 2014, PLoS Comput. Biol..

[20]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[21]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[22]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[23]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[24]  Dennis Kostka,et al.  Modeling DNA methylation dynamics with approaches from phylogenetics , 2014, Bioinform..

[25]  Yi Zhang,et al.  The diverse functions of Dot1 and H3K79 methylation. , 2011, Genes & development.

[26]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[27]  David Haussler,et al.  The UCSC Genome Browser Database: 2008 update , 2007, Nucleic Acids Res..

[28]  W. Dobyns,et al.  Mutation of the PAX2 gene in a family with optic nerve colobomas, renal anomalies and vesicoureteral reflux , 1995, Nature Genetics.

[29]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[30]  Nathan C. Sheffield,et al.  Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. , 2011, Genome research.

[31]  James A. Cuff,et al.  A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells , 2006, Cell.

[32]  Ting Wang,et al.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser , 2013, Bioinform..

[33]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[34]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[35]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[36]  I. Talianidis,et al.  Histone modifications defining active genes persist after transcriptional and mitotic inactivation , 2005, The EMBO journal.

[37]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[38]  David Haussler,et al.  The Human Epigenome Browser at Washington University , 2011, Nature Methods.

[39]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[40]  S. Horvath,et al.  Global histone modification patterns predict risk of prostate cancer recurrence , 2005, Nature.

[41]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[42]  Matteo Pellegrini,et al.  Epigenetic Reprogramming by Adenovirus e1a , 2008, Science.

[43]  Michael Q. Zhang,et al.  Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells , 2013, Cell.

[44]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[45]  Thomas Lengauer,et al.  CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure , 2006, PLoS genetics.

[46]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[47]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[48]  Guo-Cheng Yuan,et al.  Targeted Recruitment of Histone Modifications in Humans Predicted by Genomic Sequences , 2009, J. Comput. Biol..

[49]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[50]  Manolis Kellis,et al.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments , 2013, Nucleic acids research.

[51]  Kevin Struhl,et al.  SIRT7 links H3K18 deacetylation to maintenance of oncogenic transformation , 2012, Nature.

[52]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[53]  Bradley E. Bernstein,et al.  Genome-wide Chromatin State Transitions Associated with Developmental and Environmental Cues , 2013, Cell.

[54]  Hongwei Wu,et al.  CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome , 2013, BMC Medical Genomics.

[55]  A. Gnirke,et al.  Charting a dynamic DNA methylation landscape of the human genome , 2013, Nature.

[56]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[57]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[58]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[59]  H. Eskandarian Bacterial Infection A Role for SIRT2-Dependent Histone H3K18 Deacetylation in , 2013 .