METHimpute: Imputation-guided construction of complete methylomes from WGBS data

Whole-genome Bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage. Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies. Although METHimpute has been extensively tested in plants, it should be broadly applicable to other species.

[1]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[2]  Lisa M. Smith,et al.  The role of DNA (de)methylation in immune responsiveness of Arabidopsis , 2016, The Plant journal : for cell and molecular biology.

[3]  Robert J. Schmitz,et al.  Widespread natural variation of DNA methylation within angiosperms , 2016, Genome Biology.

[4]  Tasuku Ito,et al.  Loss of function mutations in the rice chromomethylase OsCMT3a cause a burst of transposition. , 2015, The Plant journal : for cell and molecular biology.

[5]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[6]  Irving E. Wang,et al.  Tissue absence initiates regeneration through Follistatin-mediated inhibition of Activin signaling , 2013, eLife.

[7]  S. Jacobsen,et al.  Locus-specific control of asymmetric and CpNpG methylation by the DRM and CMT3 methyltransferase genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Matthew D. Schultz,et al.  Transgenerational Epigenetic Instability Is a Source of Novel Methylation Variants , 2011, Science.

[9]  D. Weigel,et al.  Epialleles in plant evolution , 2012, Genome Biology.

[10]  R. Mott,et al.  Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations , 2014, Genome research.

[11]  J. Kendall,et al.  The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA , 2013, Genome research.

[12]  Thomas J. Hardcastle,et al.  DNA methylation epigenetically silences crossover hot spots and controls chromosomal domains of meiotic recombination in Arabidopsis , 2015, Genes & development.

[13]  M. Pellegrini,et al.  Plants regenerated from tissue culture contain stable epigenome changes in rice , 2013, eLife.

[14]  Robert J. Schmitz,et al.  Rate, spectrum, and evolutionary dynamics of spontaneous epimutations , 2015, Proceedings of the National Academy of Sciences.

[15]  Matthew D. Schultz,et al.  Stress induced gene expression drives transient DNA methylation changes at adjacent repetitive elements , 2015, eLife.

[16]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[17]  Xiaoyu Zhang Dynamic differential methylation facilitates pathogen stress response in Arabidopsis , 2012, Proceedings of the National Academy of Sciences.

[18]  C. Leontiou,et al.  Bisulfite Conversion of DNA: Performance Comparison of Different Kits and Methylation Quantitation of Epigenetic Biomarkers that Have the Potential to Be Used in Non-Invasive Prenatal Testing , 2015, PloS one.

[19]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[20]  J. P. Jackson,et al.  Requirement of CHROMOMETHYLASE3 for Maintenance of CpXpG Methylation , 2001, Science.

[21]  F. De Filippis,et al.  A Selected Core Microbiome Drives the Early Stages of Three Popular Italian Cheese Manufactures , 2014, PloS one.

[22]  T. Kakutani,et al.  Mobilization of transposons by a mutation abolishing full DNA methylation in Arabidopsis , 2001, Nature.

[23]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[24]  R. Jansen,et al.  Features of the Arabidopsis recombination landscape resulting from the combined loss of sequence variation and DNA methylation , 2012, Proceedings of the National Academy of Sciences.

[25]  G. Kristiansen,et al.  Performance Evaluation of Kits for Bisulfite-Conversion of DNA from Tissues, Cell Lines, FFPE Tissues, Aspirates, Lavages, Effusions, Plasma, Serum, and Urine , 2014, PloS one.

[26]  Robert J. Schmitz,et al.  Crop Epigenomics: Identifying, Unlocking, and Harnessing Cryptic Variation in Crop Genomes. , 2015, Molecular plant.

[27]  E. Bucher,et al.  Loss of DNA methylation affects the recombination landscape in Arabidopsis , 2012, Proceedings of the National Academy of Sciences.

[28]  D. Weigel,et al.  Selective epigenetic control of retrotransposition in Arabidopsis , 2009, Nature.

[29]  Julie A. Law,et al.  Establishing, maintaining and modifying DNA methylation patterns in plants and animals , 2010, Nature Reviews Genetics.

[30]  Robert J. Schmitz,et al.  Genomic Distribution of H3K9me2 and DNA Methylation in a Maize Genome , 2014, PloS one.

[31]  S. Jacobsen,et al.  Comprehensive Analysis of Silencing Mutants Reveals Complex Regulation of the Arabidopsis Methylome , 2013, Cell.

[32]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[33]  D. Baulcombe,et al.  DNA Methylation Signatures of the Plant Chromomethyltransferases , 2016, PLoS genetics.

[34]  B. Gaut,et al.  Epigenetics and plant genome evolution. , 2014, Current opinion in plant biology.

[35]  Stefan R. Henz,et al.  Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions , 2016, Cell.

[36]  D. Zivkovic,et al.  Methylome evolution in plants , 2016, Genome Biology.

[37]  R. Jansen,et al.  Mapping the Epigenetic Basis of Complex Traits , 2014, Science.

[38]  B. Gaut,et al.  Evolutionary patterns of genic DNA methylation vary across land plants , 2016, Nature Plants.

[39]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[40]  M. Matzke,et al.  RNA-Directed DNA Methylation: The Evolution of a Complex Epigenetic Pathway in Flowering Plants. , 2014, Annual review of plant biology.

[41]  V. Colot,et al.  Plant Transgenerational Epigenetics. , 2016, Annual review of genetics.

[42]  M. Pellegrini,et al.  Conservation and divergence of methylation patterning in plants and animals , 2010, Proceedings of the National Academy of Sciences.

[43]  Reduced DNA methylation in Arabidopsis thaliana results in abnormal plant development. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[44]  D. Patel,et al.  Dual Binding of Chromomethylase Domains to H3K9me2-Containing Nucleosomes Directs DNA Methylation in Plants , 2012, Cell.

[45]  Kanako O. Koyanagi,et al.  Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. , 2007, Genome research.

[46]  Frank Johannes,et al.  Assessing the Impact of Transgenerational Epigenetic Variation on Complex Traits , 2009, PLoS genetics.

[47]  A. Levy,et al.  Deficiency in DNA methylation increases meiotic crossover rates in euchromatic but not in heterochromatic regions in Arabidopsis , 2012, Proceedings of the National Academy of Sciences of the United States of America.

[48]  D. Zilberman,et al.  Genome-Wide Evolutionary Analysis of Eukaryotic DNA Methylation , 2010, Science.

[49]  R. Stöger,et al.  Errors in the bisulfite conversion of DNA: modulating inappropriate- and failed-conversion frequencies , 2008, Nucleic acids research.

[50]  S. Jacobsen,et al.  Role of the Arabidopsis DRM Methyltransferases in De Novo DNA Methylation and Gene Silencing , 2002, Current Biology.

[51]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[52]  D. Patel,et al.  Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis , 2013, Nature Structural & Molecular Biology.

[53]  R. Martienssen,et al.  Robertson's Mutator transposons in A. thaliana are regulated by the chromatin-remodeling gene Decrease in DNA Methylation (DDM1). , 2001, Genes & development.

[54]  Karsten M. Borgwardt,et al.  Spontaneous epigenetic variation in the Arabidopsis thaliana methylome , 2011, Nature.

[55]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[56]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[57]  R. Pérez,et al.  Global DNA cytosine methylation as an evolving trait: phylogenetic signal and correlated evolution with genome size in angiosperms , 2015, Front. Genet..

[58]  Francine E. Garrett-Bakelman,et al.  methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles , 2012, Genome Biology.

[59]  T. Kakutani,et al.  Bursts of retrotransposition reproduced in Arabidopsis , 2009, Nature.

[60]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[61]  Nathan M. Springer,et al.  Epigenetics and crop improvement. , 2013, Trends in genetics : TIG.

[62]  Michael J. Ziller,et al.  Saturation analysis for whole-genome bisulfite sequencing data , 2016, Nature Biotechnology.

[63]  Anne-Laure Abraham,et al.  Dynamics and biological relevance of DNA demethylation in Arabidopsis antibacterial defense , 2013, Proceedings of the National Academy of Sciences.

[64]  G. Theiler,et al.  Compromised stability of DNA methylation and transposon immobilization in mosaic Arabidopsis epigenomes. , 2009, Genes & development.

[65]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.