CGmapTools improves the precision of heterozygous SNV calls and supports allele‐specific methylation detection and visualization in bisulfite‐sequencing data

Motivation DNA methylation is important for gene silencing and imprinting in both plants and animals. Recent advances in bisulfite sequencing allow detection of single nucleotide variations (SNVs) achieving high sensitivity, but accurately identifying heterozygous SNVs from partially C‐to‐T converted sequences remains challenging. Results We designed two methods, BayesWC and BinomWC, that substantially improved the precision of heterozygous SNV calls from ˜80% to 99% while retaining comparable recalls. With these SNV calls, we provided functions for allele‐specific DNA methylation (ASM) analysis and visualizing the methylation status on reads. Applying ASM analysis to a previous dataset, we found that an average of 1.5% of investigated regions showed allelic methylation, which were significantly enriched in transposon elements and likely to be shared by the same cell‐type. A dynamic fragment strategy was utilized for DMR analysis in low‐coverage data and was able to find differentially methylated regions (DMRs) related to key genes involved in tumorigenesis using a public cancer dataset. Finally, we integrated 40 applications into the software package CGmapTools to analyze DNA methylomes. This package uses CGmap as the format interface, and designs binary formats to reduce the file size and support fast data retrieval, and can be applied for context‐wise, gene‐wise, bin‐wise, region‐wise and sample‐wise analyses and visualizations. Availability and implementation The CGmapTools software is freely available at https://cgmaptools.github.io/.

[1]  Peng Sun,et al.  Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering , 2014, Nucleic acids research.

[2]  Michael Hackenberg,et al.  NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data , 2010, Nucleic Acids Res..

[3]  A. Gnirke,et al.  Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis , 2005, Nucleic acids research.

[4]  P. Laird,et al.  Bis-SNP: Combined DNA methylation and SNP calling for Bisulfite-seq data , 2012, Genome Biology.

[5]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[6]  Qi Liu,et al.  swDMR: A Sliding Window Approach to Identify Differentially Methylated Regions Based on Whole Genome Bisulfite Sequencing , 2015, PloS one.

[7]  T. Benoukraf,et al.  GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data , 2012, Nucleic acids research.

[8]  Alfredo Staffa,et al.  BisQC: an operational pipeline for multiplexed bisulfite sequencing , 2014, BMC Genomics.

[9]  Nongluk Plongthongkum,et al.  Advances in the profiling of DNA modifications: cytosine methylation and beyond , 2014, Nature Reviews Genetics.

[10]  Albert Jeltsch,et al.  Non-imprinted allele-specific DNA methylation on human autosomes , 2009, Genome Biology.

[11]  Xia Yang,et al.  Systems Nutrigenomics Reveals Brain Gene Networks Linking Metabolic and Brain Disorders , 2016, EBioMedicine.

[12]  Yongseok Park,et al.  MethylSig: a whole genome DNA methylation analysis pipeline , 2014, Bioinform..

[13]  Pao-Yang Chen,et al.  MethGo: a comprehensive tool for analyzing whole-genome bisulfite sequencing data , 2015, BMC Genomics.

[14]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[15]  Xiaoqing Yu,et al.  MethyQA: a pipeline for bisulfite-treated methylation sequencing quality assessment , 2013, BMC Bioinformatics.

[16]  Matthew D. Schultz,et al.  Human Body Epigenome Maps Reveal Noncanonical DNA Methylation Variation , 2015, Nature.

[17]  Michael Q. Zhang,et al.  Erratum: Mammalian non-CG methylations are conserved and cell-type specific and may have been involved in the evolution of transposon elements , 2017, Scientific reports.

[18]  B. Ren,et al.  Base-Resolution Analyses of Sequence and Parent-of-Origin Dependent DNA Methylation in the Mouse Genome , 2012, Cell.

[19]  Weining Yang,et al.  Roles of versican in cancer biology--tumorigenesis, progression and metastasis. , 2013, Histology and histopathology.

[20]  M. Pellegrini,et al.  Conservation and divergence of methylation patterning in plants and animals , 2010, Proceedings of the National Academy of Sciences.

[21]  Stefan R. Henz,et al.  Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions , 2016, Cell.

[22]  A. Gnirke,et al.  Charting a dynamic DNA methylation landscape of the human genome , 2013, Nature.

[23]  S. Beck,et al.  From profiles to function in epigenomics , 2016, Nature Reviews Genetics.

[24]  Peter A. Jones Functions of DNA methylation: islands, start sites, gene bodies and beyond , 2012, Nature Reviews Genetics.

[25]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[26]  Toutai Mituyama,et al.  Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions , 2014, Nucleic acids research.

[27]  Matthew D Dean,et al.  Genomic landscape of human allele-specific DNA methylation , 2012, Proceedings of the National Academy of Sciences.

[28]  Julie A. Law,et al.  Establishing, maintaining and modifying DNA methylation patterns in plants and animals , 2010, Nature Reviews Genetics.

[29]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[30]  Matteo Pellegrini,et al.  An Epigenetic Signature in Peripheral Blood Associated with the Haplotype on 17q21.31, a Risk Factor for Neurodegenerative Tauopathy , 2014, PLoS genetics.

[31]  Michael Q. Zhang,et al.  Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells , 2013, Nucleic acids research.

[32]  Hua Yu,et al.  COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis , 2013, Nucleic acids research.

[33]  Pengfei Song,et al.  BS-SNPer: SNP calling in bisulfite-seq data , 2015, Bioinform..

[34]  Christian M. Reidys,et al.  Sequence-structure relations of pseudoknot RNA , 2009, BMC Bioinformatics.

[35]  Francine E. Garrett-Bakelman,et al.  methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles , 2012, Genome Biology.

[36]  Michael Q. Zhang,et al.  Epigenome-wide association of liver methylation patterns and complex metabolic traits in mice. , 2015, Cell metabolism.

[37]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[38]  B. Tycko,et al.  Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era , 2017, Genome Biology.

[39]  Mikel Arrospide-Elgarresta,et al.  P3BSseq: parallel processing pipeline software for automatic analysis of bisulfite sequencing data. , 2016, Bioinformatics.

[40]  Francine E. Garrett-Bakelman,et al.  Base-Pair Resolution DNA Methylation Sequencing Reveals Profoundly Divergent Epigenetic Landscapes in Acute Myeloid Leukemia , 2012, PLoS genetics.

[41]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[42]  Qihua Tan,et al.  Efficient detection of differentially methylated regions using DiMmeR , 2016, Bioinform..

[43]  Tyler H. Garvin,et al.  A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics , 2013, PloS one.

[44]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[45]  Yalu Wen,et al.  Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics , 2016, Bioinform..

[46]  Wei Li,et al.  MOABS: model based analysis of bisulfite sequencing data , 2014, Genome Biology.

[47]  Lars Bolund,et al.  SMAP: a streamlined methylation analysis pipeline for bisulfite sequencing , 2015, GigaScience.

[48]  Jing Zhang,et al.  MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data , 2014, Nucleic Acids Res..

[49]  R. Shoemaker,et al.  Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. , 2010, Genome research.