An accurate and rapid continuous wavelet dynamic time warping algorithm for end‐to‐end mapping in ultra‐long nanopore sequencing

Motivation Long‐reads, point‐of‐care and polymerase chain reaction‐free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the end‐to‐end mapping between the raw electrical current signal sequence and the reference expected signal sequence serves as the key building block to signal labeling, and the following signal visualization, variant identification and methylation detection. One of the classic algorithms to solve the signal mapping problem is the dynamic time warping (DTW). However, the ultra‐long nanopore sequencing and an order of magnitude difference in the sampling speed complexify the scenario and make the classical DTW infeasible to solve the problem. Results Here, we propose a novel multi‐level DTW algorithm, continuous wavelet DTW (cwDTW), based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low‐resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower‐resolution level to a higher‐resolution one by building a context‐dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can gain remarkable acceleration with tiny loss of the alignment accuracy. On the real nanopore datasets, cwDTW can finish an alignment task in few seconds, which is about 3000 times faster than the original DTW. By successfully applying cwDTW on the tasks of signal labeling and ultra‐long sequence comparison, we further demonstrate the power and applicability of cwDTW. Availability and implementation Our program is available at https://github.com/realbigws/cwDTW. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Matthew Loose,et al.  Real-time selective sequencing using nanopore technology , 2016, Nature Methods.

[2]  Edward R. Dougherty,et al.  Quantification of the Impact of Feature Selection on the Variance of Cross-Validation Error Estimation , 2007, EURASIP J. Bioinform. Syst. Biol..

[3]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[4]  Jiuzhou Z. Song,et al.  The Wavelet-Based Cluster Analysis for Temporal Gene Expression Data , 2007, EURASIP J. Bioinform. Syst. Biol..

[5]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[6]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[7]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[8]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[9]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[10]  Gustavo E. A. P. A. Batista,et al.  Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation , 2016, SDM.

[11]  Francesca Giordano,et al.  Oxford Nanopore MinION Sequencing and Genome Assembly , 2016, Genom. Proteom. Bioinform..

[12]  Carlos Dias Maciel,et al.  Wavelet-based dynamic time warping , 2009 .

[13]  C. Torrence,et al.  A Practical Guide to Wavelet Analysis. , 1998 .

[14]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[15]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Martin Vitek,et al.  Progressive alignment of genomic signals by multiple dynamic time warping. , 2015, Journal of theoretical biology.

[17]  K. Markides,et al.  Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. , 2002, Journal of chromatography. A.

[18]  Ji Eun Lee,et al.  De novo Identification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing , 2017, bioRxiv.

[19]  Paul Horton,et al.  Parameters for accurate genome alignment , 2010, BMC Bioinformatics.

[20]  Jordan M. Eizenga,et al.  Mapping DNA Methylation with High Throughput Nanopore Sequencing , 2017, Nature Methods.

[21]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[22]  David Haussler,et al.  Alignathon: a competitive assessment of whole-genome alignment methods , 2014, bioRxiv.

[23]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[24]  Mark Akeson,et al.  Nanopore Long-Read RNAseq Reveals Widespread Transcriptional Variation Among the Surface Receptors of Individual B cells , 2017 .

[25]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[26]  Albert J. Vilella,et al.  Ensembl comparative genomics resources , 2016, Database : the journal of biological databases and curation.

[27]  Tamas Szalay,et al.  De novo sequencing and variant calling with nanopores using PoreSeq , 2015, Nature Biotechnology.

[28]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[29]  Paul C. Boutros,et al.  Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data , 2016 .

[30]  Meinard Müller,et al.  Memory-restricted multiscale dynamic time warping , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Meinard Müller,et al.  An Efficient Multiscale Approach to Audio Synchronization , 2006, ISMIR.

[32]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[33]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[34]  Renmin Han,et al.  DeepSimulator: a deep simulator for Nanopore sequencing , 2017, bioRxiv.