gemBS: high throughput processing for DNA methylation data from bisulfite sequencing

Motivation: DNA methylation is essential for normal embryogenesis and development in mammals and can be captured at single base pair resolution by whole genome bisulfite sequencing (WGBS). Current available analysis tools are becoming rapidly outdated as they lack sensible functionality and efficiency to handle large amounts of data now commonly created. Results: We developed gemBS, a fast high‐throughput bioinformatics pipeline specifically designed for large scale BS‐Seq analysis that combines a high performance BS‐mapper (GEM3) and a variant caller specifically for BS‐Seq data (BScall). gemBS provides genotype information and methylation estimates for all genomic cytosines in different contexts (CpG and non‐CpG) and a set of quality reports for comprehensive and reproducible analysis. gemBS is highly modular and can be easily automated, while producing robust and accurate results. Availability and implementation: gemBS is released under the GNU GPLv3+ license. Source code and documentation are freely available from www.statgen.cat/gemBS. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[2]  A. Franke,et al.  DNA methylome analysis using short bisulfite sequencing data , 2012, Nature Methods.

[3]  P. Laird,et al.  Bis-SNP: Combined DNA methylation and SNP calling for Bisulfite-seq data , 2012, Genome Biology.

[4]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.

[5]  W. Reik,et al.  Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data , 2017, bioRxiv.

[6]  Matthew D. Schultz,et al.  Global Epigenomic Reconfiguration During Mammalian Brain Development , 2013, Science.

[7]  J. Oliver,et al.  MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data , 2013, F1000Research.

[8]  F. Miura,et al.  Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging , 2012, Nucleic acids research.

[9]  Tomaž Curk,et al.  iDiscover – an intelligent assistant for integrative analysis of transcriptome data , 2011 .

[10]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[11]  T. Arima,et al.  DNA Methylation Dynamics During Early Human Development , 2016, Journal of Mammalian Ova Research.

[12]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[13]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[14]  Brent S. Pedersen,et al.  Fast and accurate alignment of long bisulfite-seq reads , 2014, 1401.1129.