Landscape of CpG methylation of individual repetitive elements

Motivation: Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it produces long read lengths, and its kinetic information is sensitive to DNA modifications. Results: We propose a novel linear-time algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Using a practical read coverage of ∼30-fold from an inbred strain medaka (Oryzias latipes), we observed that both the sensitivity and precision of our method on individual CpG sites were ∼93.7%. We also observed a high correlation coefficient (R = 0.884) between our method and bisulfite sequencing, and for 92.0% of CpG sites, methylation levels ranging over [0,1] were in concordance within an acceptable difference 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and hypomethylation and detecting hypomethylation hot spots of LTRs and LINEs. We uncovered the methylation states for nearly identical active transposons, two novel LINE insertions of identity ∼99% and length 6050 base pairs (bp) in the human genome, and 16 Tol2 elements of identity >99.8% and length 4682 bp in the medaka genome. Availability and Implementation: AgIn (Aggregate on Intervals) is available at: https://github.com/hacone/AgIn Contact: ysuzuki@cb.k.u-tokyo.ac.jp or moris@cb.k.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Taro L. Saito,et al.  Genome-wide genetic variations are highly correlated with proximal DNA methylation patterns , 2012, Genome research.

[2]  Fred H. Gage,et al.  Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition , 2005, Nature.

[3]  Jonas Korlach,et al.  Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures , 2008, Proceedings of the National Academy of Sciences.

[4]  Gang Fang,et al.  Detecting DNA Modifications from SMRT Sequencing Data by Modeling Sequence Context Dependence of Polymerase Kinetic , 2013, PLoS Comput. Biol..

[5]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[6]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[7]  W. Richard McCombie,et al.  Sperm Methylation Profiles Reveal Features of Epigenetic Inheritance and Evolution in Primates , 2011, Cell.

[8]  Akihiko Koga,et al.  Targeted reduction of the DNA methylation level with 5-azacytidine promotes excision of the medaka fish Tol2 transposable element. , 2006, Genetical research.

[9]  G. Daley,et al.  Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming , 2009, Nature Biotechnology.

[10]  Natalie Jäger,et al.  Genome-wide mapping of DNA methylation: a quantitative technology comparison , 2010, Nature Biotechnology.

[11]  Richard J. Roberts,et al.  Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing , 2011, Nucleic acids research.

[12]  Jonas Korlach,et al.  Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation , 2012, BMC Biology.

[13]  Michael Rehli,et al.  Allele-specific DNA methylation in mouse strains is mainly determined by cis-acting sequences. , 2009, Genome research.

[14]  R. Reinhardt,et al.  DNA Methylation Analysis of Chromosome 21 Gene Promoters at Single Base Pair and Single Allele Resolution , 2009, PLoS genetics.

[15]  Matthew D. Schultz,et al.  Transgenerational Epigenetic Instability Is a Source of Novel Methylation Variants , 2011, Science.

[16]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[17]  Evan E. Eichler,et al.  LINE-1 Retrotransposition Activity in Human Genomes , 2010, Cell.

[18]  Ali Bashir,et al.  Detecting epigenetic motifs in low coverage and metagenomics settings , 2014, BMC Bioinformatics.

[19]  Masaaki Oda,et al.  QUMA: quantification tool for methylation analysis , 2008, Nucleic Acids Res..

[20]  T. Mikkelsen,et al.  Genome-scale DNA methylation maps of pluripotent and differentiated cells , 2008, Nature.

[21]  Zachary D. Smith,et al.  A unique regulatory phase of DNA methylation in the early mammalian embryo , 2012, Nature.

[22]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[23]  B. Tycko,et al.  Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation , 2008, Nature Genetics.

[24]  F. Miura,et al.  Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging , 2012, Nucleic acids research.

[25]  Greg Miller,et al.  Epigenetics. The seductive allure of behavioral epigenetics. , 2010, Science.

[26]  Yao Yang,et al.  Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS) , 2015, BMC Genomics.

[27]  Wing Hung Wong,et al.  Characterization of the human ESC transcriptome by hybrid sequencing , 2013, Proceedings of the National Academy of Sciences.

[28]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[29]  Wei Gu,et al.  Improving genome assemblies by sequencing PCR products with PacBio. , 2012, BioTechniques.

[30]  Jing Zhang,et al.  Sperm, but Not Oocyte, DNA Methylome Is Inherited by Zebrafish Early Embryos , 2013, Cell.

[31]  C. Naughton,et al.  Epigenetic Transgenerational Actions of Endocrine Disruptors and Male Fertility , 2006 .

[32]  Koichi Kawakami,et al.  Tol2: a versatile gene transfer vector in vertebrates , 2007, Genome Biology.

[33]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[34]  Peter A. Jones,et al.  Allele-specific methylation of the human c-Ha-ras-1 gene , 1987, Cell.

[35]  Flora Tassone,et al.  Sequencing the Unsequenceable: Expanded CGG Repeats in the Human FMR1 Gene , 2013 .

[36]  A. Furano,et al.  The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. , 2000, Progress in nucleic acid research and molecular biology.

[37]  Michael J Meaney,et al.  Epigenetic programming by maternal behavior , 2004, Nature Neuroscience.

[38]  R. Gibbs,et al.  Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology , 2012, PloS one.

[39]  L. Aravind,et al.  DNA Methylation on N6-Adenine in C. elegans , 2015, Cell.

[40]  Yoshiyuki Sakaki,et al.  A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. , 2004, Genome research.

[41]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[42]  Fred H. Gage,et al.  L1 retrotransposition in neurons is modulated by MeCP2 , 2010, Nature.

[43]  D. Zilberman,et al.  Genome-Wide Evolutionary Analysis of Eukaryotic DNA Methylation , 2010, Science.

[44]  Wei Chen,et al.  A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data , 2015, Epigenetics.

[45]  Tyson A. Clark,et al.  data to detect putative modifications to DNA bases Modeling kinetic rate variation in third generation DNA sequencing , 2012 .

[46]  A. Chess,et al.  Extensive sequence-influenced DNA methylation polymorphism in the human genome , 2010, Epigenetics & Chromatin.

[47]  Thomas Dandekar,et al.  L1Base: from functional annotation to prediction of active LINE-1 elements , 2004, Nucleic Acids Res..

[48]  Fumiko Ohta,et al.  The medaka draft genome and insights into vertebrate genome evolution , 2007, Nature.

[49]  Sarah McCalmon,et al.  Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene , 2013, Genome research.

[50]  Lovelace J. Luquette,et al.  Landscape of Somatic Retrotransposition in Human Cancers , 2012, Science.

[51]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[52]  Michael Q. Zhang,et al.  Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells , 2013, Cell.

[53]  Andrew Menzies,et al.  Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes , 2014, Science.

[54]  M. Csűros,et al.  Maximum-scoring segment sets , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[55]  R. Jirtle,et al.  Environmental epigenomics and disease susceptibility , 2007, Nature Reviews Genetics.

[56]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[57]  Taishin Kin,et al.  Idiographica: a general-purpose web application to build idiograms on-demand for human, mouse and rat , 2007, Bioinform..

[58]  Miklós Csürös,et al.  Maximum-Scoring Segment Sets , 2004, IEEE ACM Trans. Comput. Biol. Bioinform..

[59]  Thomas Lengauer,et al.  Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping , 2008, Nucleic acids research.

[60]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[61]  Fang Wang,et al.  CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data , 2012, Nucleic acids research.

[62]  L. Aravind,et al.  DNA Methylation on N 6-Adenine in C . elegans Graphical Abstract Highlights , 2015 .

[63]  R. Shoemaker,et al.  Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. , 2010, Genome research.

[64]  Michael J. Ziller,et al.  Transcriptional and Epigenetic Dynamics during Specification of Human Embryonic Stem Cells , 2013, Cell.

[65]  J. Goodier,et al.  Retrotransposition in tumors and brains , 2014, Mobile DNA.

[66]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[67]  A. Shimada,et al.  Evidence for recent invasion of the medaka fish genome by the Tol2 transposable element. , 2000, Genetics.

[68]  Michael Krawczak,et al.  Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomes , 1989, Human Genetics.

[69]  Jonas Korlach,et al.  Direct Detection and Sequencing of Damaged DNA Bases , 2011, Genome Integrity.

[70]  P. Molloy,et al.  DNA hypomethylation and human diseases. , 2007, Biochimica et biophysica acta.

[71]  Peter L Molloy,et al.  Hypomethylation of repeated DNA sequences in cancer. , 2010, Epigenomics.

[72]  P. Spellman,et al.  High-throughput method for analyzing methylation of CpGs in targeted genomic regions , 2010, Proceedings of the National Academy of Sciences.

[73]  James H. Bullard,et al.  A hybrid approach for the automated finishing of bacterial genomes , 2012, Nature Biotechnology.

[74]  Tyson A. Clark,et al.  Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing , 2012, Nature Biotechnology.