An End-to-end Oxford Nanopore Basecaller Using Convolution-augmented Transformer

Oxford Nanopore sequencing is fastly becoming an active field in genomics, and it’s critical to basecall nucleotide sequences from the complex electrical signals. Many efforts have been devoted to developing new basecalling tools over the years. However, the basecalled reads still suffer from a high error rate and slow speed. Here, we developed an open-source basecalling method, CATCaller, by simultaneously capturing global context through Attention and modeling local dependencies through dynamic convolution. The method was shown to consistently outper-form the ONT default basecaller Albacore, Guppy, and a recently developed attention-based method SACall in read accuracy. More importantly, our method is fast through a heterogeneously computational model to integrate both CPUs and GPUs. When compared to SACall, the method is nearly 4 times faster on a single GPU, and is highly scalable in parallelization with a further speedup of 3.3 on a four-GPU node.

[1]  Matei David,et al.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data , 2016, bioRxiv.

[2]  Ji Eun Lee,et al.  De novo Identification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing , 2017, bioRxiv.

[3]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[4]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[5]  Tomáš Vinař,et al.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads , 2016, PloS one.

[6]  James B. Brown,et al.  BasecRAWller: Streaming Nanopore Basecalling Directly from Raw Signal , 2017, bioRxiv.

[7]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[8]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[9]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[10]  Brona Brejová,et al.  DeepNano-blitz: A Fast Base Caller for MinION Nanopore Sequencers , 2020, bioRxiv.

[11]  Feng Luo,et al.  An attention-based neural network basecaller for Oxford Nanopore sequencing data , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  James Clarke,et al.  Nanopore development at Oxford Nanopore , 2016, Nature Biotechnology.

[14]  Yuedong Yang,et al.  Communicative Representation Learning on Attributed Molecular Graphs , 2020, IJCAI.

[15]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[16]  Gopinath Rajadinakaran Oxford Nanopore Technology: A Promising Long-Read Sequencing Platform To Study Exon Connectivity and Characterize Isoforms of Complex Genes , 2018 .

[17]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[18]  Li Qu,et al.  NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm , 2020, bioRxiv.

[19]  Minh Duc Cao,et al.  Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning , 2017, bioRxiv.

[20]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[21]  Yongjian Li,et al.  Predicting drug–protein interaction using quasi-visual question answering system , 2019, Nature Machine Intelligence.

[22]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[23]  Jun Xu,et al.  Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks , 2019, J. Chem. Inf. Model..

[24]  Ryan R. Wick,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[25]  Zhijian Liu,et al.  Lite Transformer with Long-Short Range Attention , 2020, ICLR.

[26]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[27]  S. Phinn,et al.  Australian vegetated coastal ecosystems as global hotspots for climate change mitigation , 2019, Nature Communications.

[28]  Li Fang,et al.  Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data , 2019, Nature Communications.

[29]  Mile Sikic,et al.  MinCall - MinION end2end convolutional deep learning basecaller , 2019, ArXiv.

[30]  Lukasz Kaiser,et al.  Depthwise Separable Convolutions for Neural Machine Translation , 2017, ICLR.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Yuedong Yang,et al.  Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting , 2020, J. Chem. Inf. Model..

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.