Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks

Multiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This ‘signal-space’ approach allows for greater accuracy than existing ‘base-space’ tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner.

[1]  M. Hetman,et al.  Ribosomal stress and Tp53-mediated neuronal apoptosis in response to capsid protein of the Zika virus , 2017, Scientific Reports.

[2]  David J. Jilk,et al.  Recurrent Processing during Object Recognition , 2011, Front. Psychol..

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[5]  David J. Fleet,et al.  Computer Vision – ECCV 2014 , 2014, Lecture Notes in Computer Science.

[6]  Minh Duc Cao,et al.  Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning , 2017, bioRxiv.

[7]  Vladim'ir Bovza,et al.  Improving Nanopore Reads Raw Signal Alignment , 2017, 1705.01620.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[13]  M. Botvinick,et al.  Neural representations of events arise from temporal community structure , 2013, Nature Neuroscience.

[14]  Juan de Lara,et al.  Supporting user-oriented analysis for multi-view domain-specific visual languages , 2009, Inf. Softw. Technol..

[15]  Guido Jenster,et al.  CGtag: complete genomics toolkit and annotation in a cloud-based Galaxy , 2014, GigaScience.

[16]  Aaron R Quinlan,et al.  A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer , 2014, GigaScience.

[17]  Ryan R. Wick,et al.  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017, bioRxiv.

[18]  Patrick M. Pilarski,et al.  First steps towards an intelligent laser welding architecture using deep neural networks and reinforcement learning , 2014 .

[19]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20]  James B. Brown,et al.  BasecRAWller: Streaming Nanopore Basecalling Directly from Raw Signal , 2017, bioRxiv.

[21]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[22]  Matthew Loose,et al.  Real-time selective sequencing using nanopore technology , 2016, Nature Methods.

[23]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[24]  Michael Liem,et al.  Rapid de novo assembly of the European eel genome from nanopore sequencing reads , 2017, Scientific Reports.

[25]  Martin Sosic,et al.  Edlib: a C/C++ library for fast, exact sequence alignment using edit distance , 2016, bioRxiv.

[26]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[27]  I. Weissman,et al.  Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing , 2017, bioRxiv.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tomáš Vinař,et al.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads , 2016, PloS one.

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  S. Beck Multiplex DNA sequencing. , 1993, Methods in molecular biology.

[32]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[33]  Chengzhu Yu,et al.  The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[34]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[35]  Haley R Pipkins,et al.  Polyamine transporter potABCD is required for virulence of encapsulated but not nonencapsulated Streptococcus pneumoniae , 2017, PloS one.

[36]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[37]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[38]  Ken Perlin,et al.  An image synthesizer , 1988 .