NanoDJ: A Dockerized Jupyter Notebook for Interactive Oxford Nanopore MinION Sequence Manipulation and Genome Assembly

Background The Oxford Nanopore Technologies (ONT) MinION portable sequencer makes it possible to use cutting-edge genomic technologies in the field and the academic classroom. Results We present NanoDJ, a Jupyter notebook integration of tools for simplified manipulation and assembly of DNA sequences produced by ONT devices. It integrates basecalling, read trimming and quality control, simulation and plotting routines with a variety of widely used aligners and assemblers, including procedures for hybrid assembly. Conclusions With the use of Jupyter-facilitated access to self-explanatory contents of applications and the interactive visualization of results, as well as by its distribution into a Docker software container, NanoDJ is aimed to simplify and make more reproducible ONT DNA sequence analysis. The NanoDJ package code, documentation and installation instructions are freely available at https://github.com/genomicsITER/NanoDJ.

[1]  Daniel R. Garalde,et al.  Highly parallel direct RNA sequencing on an array of nanopores , 2016, Nature Methods.

[2]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[3]  Ka Yee Yeung,et al.  Reproducible Bioconductor Workflows Using Browser-based Interactive Notebooks and Containers , 2017, bioRxiv.

[4]  Gastone Castellani,et al.  HAPLOFIND: A New Method for High‐Throughput mtDNA Haplogroup Assignment , 2013, Human mutation.

[5]  Douglas J. Botkin,et al.  Nanopore DNA Sequencing and Genome Assembly on the International Space Station , 2016, bioRxiv.

[6]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[7]  Cesare Centomo,et al.  On site DNA barcoding by nanopore sequencing , 2017, PloS one.

[8]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[9]  James Clarke,et al.  Nanopore development at Oxford Nanopore , 2016, Nature Biotechnology.

[10]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[11]  Ji Eun Lee,et al.  De novo Identification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing , 2017, bioRxiv.

[12]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[13]  Ryan R. Wick,et al.  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017, bioRxiv.

[14]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[15]  Sergey Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[16]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[17]  Justin Chu,et al.  NanoSim: nanopore sequence read simulator based on statistical characterization , 2016, bioRxiv.

[18]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[21]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[22]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[23]  E. Zaikova,et al.  Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer. , 2017, Journal of biomolecular techniques : JBT.

[24]  Pablo Prieto,et al.  The impact of Docker containers on the performance of genomic pipelines , 2015, PeerJ.

[25]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[26]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[27]  N. R. Faria,et al.  Establishment and cryptic transmission of Zika virus in Brazil and the Americas , 2017, Nature.

[28]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[29]  David E. Cook,et al.  Long Read Annotation (LoReAn): automated eukaryotic genome annotation based on long-read cDNA sequencing , 2017, bioRxiv.

[30]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[31]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[32]  Yaniv Erlich,et al.  Using mobile sequencers in an academic classroom , 2016, eLife.

[33]  Rolf Backofen,et al.  Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers , 2017, PLoS Comput. Biol..

[34]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[35]  Aaron Pomerantz,et al.  Real-time DNA barcoding in a rainforest using nanopore sequencing: opportunities for rapid biodiversity assessments and local capacity building , 2018, GigaScience.

[36]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..