A new parallel pipeline for DNA methylation analysis of long reads datasets

BackgroundDNA methylation is an important mechanism of epigenetic regulation in development and disease. New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete.ResultsIn this paper, we propose a new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while yielding a better level of sensitivity, particularly for datasets composed of long reads. This strategy can be exported to other methylation, DNA and RNA analysis tools.ConclusionsThe developed software tool achieves execution times one order of magnitude shorter than the existing tools, while yielding equal sensitivity for short reads and even better sensitivity for long reads.

[1]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[2]  José Duato,et al.  A parallel and sensitive software tool for methylation analysis on multicore platforms , 2015, Bioinform..

[3]  Pao-Yang Chen,et al.  BS Seeker: precise mapping for bisulfite sequencing , 2010, BMC Bioinformatics.

[4]  Meng He,et al.  Indexing Compressed Text , 2003 .

[5]  E. S. Quintana-Ortí,et al.  Highly sensitive and ultrafast read mapping for RNA-seq analysis , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  Ignacio Blanquer,et al.  Acceleration of short and long DNA read mapping without loss of accuracy using suffix array , 2014, Bioinform..

[10]  Christopher A. Miller,et al.  Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing , 2010, BMC Bioinformatics.

[11]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[12]  Siu-Ming Yiu,et al.  High Throughput Short Read Alignment via Bi-directional BWT , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[13]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[14]  Christoph Bock,et al.  RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing , 2012, Bioinform..

[15]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[16]  Joaquín Dopazo,et al.  HPG pore: an efficient and scalable framework for nanopore sequencing data , 2016, BMC Bioinformatics.