Comprehensive benchmarking of software for mapping whole genome bisulfite data: from read alignment to DNA methylation analysis

Whole genome bisulfite sequencing is currently at the forefront of epigenetic analysis, facilitating the nucleotide-level resolution of 5-methylcytosine (5mC) on a genome-wide scale. Specialised software have been developed to accommodate the unique difficulties in aligning such sequencing reads to a given reference, building on the knowledge acquired from model organisms such as human, or Arabidopsis thaliana. As the field of epigenetics expands its purview to non-model plant species, new challenges arise which bring into question the suitability of previously established tools. Herein, nine short-read aligners are evaluated: Bismark, BS-Seeker2, BSMAP, BWA-meth, ERNE-BS5, GEM3, GSNAP, Last, and segemehl. Precision-recall of simulated alignments, in comparison to real sequencing data obtained from three natural accessions, reveals on-balance that BWA-meth and BSMAP are able to make the best use of the data during mapping. The influence of difficult-to-map regions, characterised by deviations in sequencing depth over repeat annotations, is evaluated in terms of the mean absolute deviation of the resulting methylation calls in comparison to a realistic methylome. Downstream methylation analysis is responsive to the handling of multi-mapping reads relative to mapping quality (MAPQ), and potentially susceptible to bias arising from the increased sequence complexity of densely-methylated reads.

[1]  Kiyoshi Asai,et al.  A mostly traditional approach improves alignment of bisulfite-converted DNA , 2012, Nucleic acids research.

[2]  Jeffrey P. Mower,et al.  Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity , 2017, GigaScience.

[3]  R. Lister,et al.  Finding the fifth base: genome-wide sequencing of cytosine methylation. , 2009, Genome research.

[4]  Antony J. Williams,et al.  Open-source QSAR models for pKa prediction using multiple machine learning approaches , 2019, Journal of Cheminformatics.

[5]  Jian Huang,et al.  Regularized gene selection in cancer microarray meta-analysis , 2009, BMC Bioinformatics.

[6]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[7]  Peter F. Stadler,et al.  Fast and sensitive mapping of bisulfite-treated sequencing data , 2012, Bioinform..

[8]  W. Reik,et al.  Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data , 2017, Genome Biology.

[9]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[10]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[11]  Anshul Kundaje,et al.  Umap and Bismap: quantifying genome and methylome mappability , 2016, bioRxiv.

[12]  L. E. McDonald,et al.  A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[13]  K. Dorn,et al.  A draft genome of field pennycress (Thlaspi arvense) provides tools for the domestication of a new winter biofuel crop , 2015, DNA research : an international journal for rapid publication of reports on genes and genomes.

[14]  Alberto Policriti,et al.  ERNE-BS5: aligning BS-treated sequences by multiple hits on a 5-letters alphabet , 2012, BCB '12.

[15]  Brent S. Pedersen,et al.  Fast and accurate alignment of long bisulfite-seq reads , 2014, 1401.1129.

[16]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[17]  D. Weigel,et al.  Selective epigenetic control of retrotransposition in Arabidopsis , 2009, Nature.

[18]  Liqing Zhang,et al.  Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools , 2014, Adv. Bioinformatics.

[19]  Zhiping Weng,et al.  Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data , 2015, Briefings Bioinform..

[20]  Coby Viner,et al.  DNAmod: the DNA modification database , 2016, bioRxiv.

[21]  A. Milosavljevic,et al.  Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing , 2014, Nucleic acids research.

[22]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[23]  Jian‐Kang Zhu,et al.  Critical roles of DNA demethylation in the activation of ripening-induced genes and inhibition of ripening-repressed genes in tomato fruit , 2017, Proceedings of the National Academy of Sciences.

[24]  Matteo Pellegrini,et al.  Genome-wide Hi-C analyses in wild-type and mutants reveal high-resolution chromatin interactions in Arabidopsis. , 2014, Molecular cell.

[25]  K. Holt,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[26]  Euan J. Rodger,et al.  Comparison of alignment software for genome-wide bisulphite sequence data , 2012, Nucleic acids research.

[27]  T. Kakutani,et al.  Bursts of retrotransposition reproduced in Arabidopsis , 2009, Nature.

[28]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[29]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[30]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[31]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[33]  Ahmed Mahas,et al.  RNA virus interference via CRISPR/Cas13a system in plants , 2017, Genome Biology.

[34]  M. Pellegrini,et al.  Genome-wide High-Resolution Mapping and Functional Analysis of DNA Methylation in Arabidopsis , 2006, Cell.

[35]  Jian‐Kang Zhu,et al.  Regulatory link between DNA methylation and active demethylation in Arabidopsis , 2015, Proceedings of the National Academy of Sciences.

[36]  Marc W. Schmid,et al.  Hi-C analysis in Arabidopsis identifies the KNOT, a structure with similarities to the flamenco locus of Drosophila. , 2014, Molecular cell.