Nanopore Sequencing Technology and Tools: Computational Analysis of the Current State, Bottlenecks, and Future Directions

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages, and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we 1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and 2) provide guidelines for determining the appropriate tools for each step. We analyze various combinations of different tools and expose the tradeoffs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, in order to overcome the high error rates of the nanopore sequencing technology.

[1]  J. Shendure,et al.  DNA sequencing at 40: past, present and future , 2017, Nature.

[2]  Carlos de Lannoy,et al.  A sequencer coming of age: De novo genome assembly using MinION reads. , 2017, F1000Research.

[3]  C. Alkan,et al.  MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering , 2017, 1707.01631.

[4]  Alberto Magi,et al.  Nanopore sequencing data analysis: state of the art, applications and challenges , 2017, Briefings Bioinform..

[5]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[6]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[7]  Deanna M. Church,et al.  Building and Improving Reference Genome Assemblies , 2017, Proceedings of the IEEE.

[8]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[9]  I. Birol,et al.  Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art , 2016, Bioinform..

[10]  Francesca Giordano,et al.  Oxford Nanopore MinION Sequencing and Genome Assembly , 2016, Genom. Proteom. Bioinform..

[11]  Can Alkan,et al.  On genomic repeats and reproducibility , 2016, Bioinform..

[12]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[13]  Onur Mutlu,et al.  GateKeeper: a new hardware architecture for accelerating pre‐alignment in DNA short read mapping , 2016, Bioinform..

[14]  Tomáš Vinař,et al.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads , 2016, PloS one.

[15]  Matei David,et al.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data , 2016, bioRxiv.

[16]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[17]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[18]  Vivien Marx,et al.  Nanopores: a sequencer in your backpack , 2015, Nature Methods.

[19]  David A. Eccles,et al.  MinION Analysis and Reference Consortium: Phase 1 data release and analysis , 2015, F1000Research.

[20]  Onur Mutlu,et al.  Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping , 2015, Bioinform..

[21]  Stefan Engelen,et al.  Genome assembly using Nanopore-guided long and error-free DNA reads , 2015, BMC Genomics.

[22]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[23]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[24]  Aaron R Quinlan,et al.  Erratum: A reference bacterial genome dataset generated on the MinIONTM portable single-molecule nanopore sequencer , 2015, GigaScience.

[25]  Aaron R. Quinlan,et al.  A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer , 2014, bioRxiv.

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[28]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[29]  Gabor T. Marth,et al.  MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping , 2013, PloS one.

[30]  Timothy P. L. Smith,et al.  Reducing assembly complexity of microbial genomes with single-molecule sequencing , 2013, Genome Biology.

[31]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[32]  C. Alkan,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[33]  Nuno A. Fonseca,et al.  Tools for mapping high-throughput sequencing data , 2012, Bioinform..

[34]  Knut Reinert,et al.  Fast and accurate read mapping with approximate seeds and multiple backtracking , 2012, Nucleic acids research.

[35]  Carl Ebeling,et al.  Hardware Acceleration of Short Read Mapping , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[36]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[37]  Ümit V. Çatalyürek,et al.  Benchmarking short sequence mapping tools , 2013, BMC Bioinformatics.

[38]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[39]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[40]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[41]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[42]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[43]  K. Reinert,et al.  RazerS--fast read mapping with sensitivity control. , 2009, Genome research.

[44]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[45]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[46]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[47]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[48]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[49]  H. Bayley,et al.  Continuous base identification for single-molecule nanopore DNA sequencing. , 2009, Nature nanotechnology.

[50]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[51]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[52]  D. Branton,et al.  The potential and challenges of nanopore sequencing , 2008, Nature Biotechnology.

[53]  Dean M. Tullsen,et al.  Initial observations of the simultaneous multithreading Pentium 4 processor , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[54]  David A. Koufaty,et al.  Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.

[55]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[56]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[58]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[59]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[60]  D. Branton,et al.  Characterization of individual polynucleotide molecules using a membrane channel. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[61]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[62]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[63]  Mario Nemirovsky,et al.  Increasing superscalar performance through multistreaming , 1995, PACT.

[64]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[65]  Kozo Kimura,et al.  An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[66]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[67]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[69]  Nan Li,et al.  Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. , 2012, Briefings in functional genomics.

[70]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[71]  William Magro,et al.  Hyper-Threading Technology: Impact on Compute-Intensive Workloads , 2002 .