Simulation of Nanopore Sequencing Signals Based on BiGRU

Oxford Nanopore sequencing is an important sequencing technology, which reads the nucleotide sequence by detecting the electrical current signal changes when DNA molecule is forced to pass through a biological nanopore. The research on signal simulation of nanopore sequencing is highly desirable for method developments of nanopore sequencing applications. To improve the simulation accuracy, we propose a novel signal simulation method based on Bi-directional Gated Recurrent Units (BiGRU). In this method, the signal processing model based on BiGRU is built to replace the traditional low-pass filter to post-process the ground-truth signal calculated by the input nucleotide sequence and nanopore sequencing pore model. Gaussian noise is then added to the filtered signal to generate the final simulated signal. This method can accurately model the relation between ground-truth signal and real-world sequencing signal through experimental sequencing data. The simulation results reveal that the proposed method utilizing the powerful learning ability of the neural network can generate the simulated signal that is closer to the real-world sequencing signal in the time and frequency domains than the existing simulation method.

[1]  Ryan R. Wick,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[2]  Victoria Shabardina,et al.  Bioinformatics of nanopore sequencing , 2019, Journal of Human Genetics.

[3]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[4]  Sergey M Bezrukov,et al.  On 'three decades of nanopore sequencing' , 2016, Nature Biotechnology.

[5]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[6]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[7]  Minh Duc Cao,et al.  Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning , 2017, bioRxiv.

[8]  Mark Akeson,et al.  Automated Forward and Reverse Ratcheting of DNA in a Nanopore at Five Angstrom Precision1 , 2012, Nature Biotechnology.

[9]  Oliver G. Pybus,et al.  Mobile real-time surveillance of Zika virus in Brazil , 2016, Genome Medicine.

[10]  Martín Abadi,et al.  TensorFlow: learning functions at scale , 2016, ICFP.

[11]  Hugh E. Olsen,et al.  Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells , 2017, Nature Communications.

[12]  Justin Chu,et al.  NanoSim: nanopore sequence read simulator based on statistical characterization , 2016, bioRxiv.

[13]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[14]  W. Kloosterman,et al.  From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy , 2018, Genome Biology.

[15]  Alexander Payne,et al.  BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files , 2018, Bioinform..

[16]  Renmin Han,et al.  An accurate and rapid continuous wavelet dynamic time warping algorithm for end‐to‐end mapping in ultra‐long nanopore sequencing , 2018, Bioinform..

[17]  Matei David,et al.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data , 2016, bioRxiv.

[18]  Justin Chu,et al.  NanoSim: nanopore sequence read simulator based on statistical characterization , 2016 .

[19]  Mark W Grinstaff,et al.  Single-molecule protein sensing in a nanopore: a tutorial. , 2018, Chemical Society reviews.

[20]  Sheng Wang,et al.  DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing , 2020, Bioinform..

[21]  C. Dekker,et al.  Comparing Current Noise in Biological and Solid-State Nanopores , 2020, ACS nano.

[22]  Ilva Hanun Harlisa,et al.  Identifying Single Viruses Using Biorecognition Solid-State Nanopores. , 2018, Journal of the American Chemical Society.

[23]  Fabio Cecconi,et al.  Protein sequencing via nanopore based devices: a nanofluidics perspective , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[24]  Tomáš Vinař,et al.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads , 2016, PloS one.

[25]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[26]  Yu Li,et al.  Deep learning in bioinformatics: introduction, application, and perspective in big data era , 2019, bioRxiv.

[27]  Min Zhao,et al.  The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing. , 2018, Briefings in functional genomics.

[28]  D. Posada,et al.  A comparison of tools for the simulation of genomic next-generation sequencing data , 2016, Nature Reviews Genetics.

[29]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Richard M Leggett,et al.  A world of opportunities with nanopore sequencing. , 2017, Journal of experimental botany.

[31]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[32]  Renmin Han,et al.  DeepSimulator: a deep simulator for Nanopore sequencing , 2017, bioRxiv.