The DNA methylation haplotype (mHap) format and mHapTools

SUMMARY Bisulfite sequencing (BS-seq) is currently the gold standard for measuring genome-wide DNA methylation profiles at single-nucleotide resolution. Most analyses focus on mean CpG methylation and ignore methylation states on the same DNA fragments [DNA methylation haplotypes (mHaps)]. Here, we propose mHap, a simple DNA mHap format for storing DNA BS-seq data. This format reduces the size of a BAM file by 40- to 140-fold while retaining complete read-level CpG methylation information. It is also compatible with the Tabix tool for fast and random access. We implemented a command-line tool, mHapTools, for converting BAM/SAM files from existing platforms to mHap files as well as post-processing DNA methylation data in mHap format. With this tool, we processed all publicly available human reduced representation bisulfite sequencing data and provided these data as a comprehensive mHap database. AVAILABILITY AND IMPLEMENTATION https://jiantaoshi.github.io/mHap/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Michael J. Ziller,et al.  Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. , 2014, Cancer cell.

[2]  Ze Wang,et al.  MethHaplo: combining allele-specific DNA methylation and SNPs for haplotype region identification , 2020, BMC Bioinformatics.

[3]  Ping Zhu,et al.  CGmapTools improves the precision of heterozygous SNV calls and supports allele‐specific methylation detection and visualization in bisulfite‐sequencing data , 2018, Bioinform..

[4]  D. Bourc’his,et al.  The diverse roles of DNA methylation in mammalian development and disease , 2019, Nature Reviews Molecular Cell Biology.

[5]  Kang Zhang,et al.  Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA , 2017, Nature Genetics.

[6]  Pao-Yang Chen,et al.  BS Seeker: precise mapping for bisulfite sequencing , 2010, BMC Bioinformatics.

[7]  Andrew E. Teschendorff,et al.  Statistical and integrative system-level analysis of DNA methylation data , 2017, Nature Reviews Genetics.

[8]  Thomas Lengauer,et al.  Quantitative comparison of within-sample heterogeneity scores for DNA methylation data , 2020, Nucleic acids research.

[9]  J. Li,et al.  Cellular Heterogeneity–Adjusted cLonal Methylation (CHALM) improves prediction of gene expression , 2021, Nature Communications.

[10]  R. Shoemaker,et al.  Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. , 2010, Genome research.

[11]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[12]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[13]  Stefano Lonardi,et al.  BRAT-BW: efficient and accurate mapping of bisulfite-treated reads , 2012, Bioinform..

[14]  Stefano Lonardi,et al.  BRAT: bisulfite-treated reads analysis tool , 2010, Bioinform..

[15]  Tyler H. Garvin,et al.  A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics , 2013, PloS one.

[16]  Idoia Ochoa,et al.  METHCOMP: a special purpose compression platform for DNA methylation data , 2018, Bioinform..

[17]  Xiaoqing Yu,et al.  MethyQA: a pipeline for bisulfite-treated methylation sequencing quality assessment , 2013, BMC Bioinformatics.

[18]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[19]  Joshua M. Korn,et al.  Next-generation characterization of the Cancer Cell Line Encyclopedia , 2019, Nature.

[20]  Zachary D. Smith,et al.  Epigenetic restriction of extraembryonic lineages mirrors the somatic transition to cancer , 2017, Nature.

[21]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..