Field-based species identification in eukaryotes using single molecule, real-time sequencing

Advances in DNA sequencing and informatics have revolutionised biology over the past four decades, but technological limitations have left many applications unexplored1,2. Recently, portable, real-time, nanopore sequencing (RTnS) has become available. This offers opportunities to rapidly collect and analyse genomic data anywhere3–5. However, the generation of datasets from large, complex genomes has been constrained to laboratories6,7. The portability and long DNA sequences of RTnS offer great potential for field-based species identification, but the feasibility and accuracy of these technologies for this purpose have not been assessed. Here, we show that a field-based RTnS analysis of closely-related plant species (Arabidopsis spp.)8 has many advantages over laboratory-based high-throughput sequencing (HTS) methods for species level identification-by-sequencing and de novo phylogenomics. Samples were collected and sequenced in a single day by RTnS using a portable, “al fresco” laboratory. Our analyses demonstrate that correctly identifying unknown reads from matches to a reference database with RTnS reads enables rapid and confident species identification. Individually annotated RTnS reads can be used to infer the evolutionary relationships of A. thaliana. Furthermore, hybrid genome assembly with RTnS and HTS reads substantially improved upon a genome assembled from HTS reads alone. Field-based RTnS makes real-time, rapid specimen identification and genome wide analyses possible. These technological advances are set to revolutionise research in the biological sciences9 and have broad implications for conservation, taxonomy, border agencies and citizen science.

[1]  R. Crozier,et al.  A fuzzy‐set‐theory‐based approach to analyse species membership in DNA barcoding , 2012, Molecular ecology.

[2]  Peter M Hollingsworth,et al.  Telling plant species apart with DNA: from barcodes to genomes , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Arwyn Edwards,et al.  Extreme metagenomics using nanopore DNA sequencing : a field report from Svalbard , 78 ° N , 2016 .

[4]  Lisa C. Crossman,et al.  Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines by nanopore-based metagenomic sequencing , 2016, The Journal of antimicrobial chemotherapy.

[5]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[6]  E. Datema,et al.  The megabase-sized fungal genome of Rhizoctonia solani assembled from nanopore reads only , 2016, bioRxiv.

[7]  David Posada,et al.  Multilocus inference of species trees and DNA barcoding , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[8]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[9]  Matei David,et al.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data , 2016, bioRxiv.

[10]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[11]  Oliver G. Pybus,et al.  Mobile real-time surveillance of Zika virus in Brazil , 2016, Genome Medicine.

[12]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[13]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[14]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[16]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[17]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[18]  J. Sese,et al.  Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specific polymorphism , 2016, Nature Genetics.

[19]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[20]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[21]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[22]  Cuong Q. Tang,et al.  The widely used small subunit 18S rDNA molecule greatly underestimates true diversity in biodiversity surveys of the meiofauna , 2012, Proceedings of the National Academy of Sciences.

[23]  D. Branton,et al.  Characterization of individual polynucleotide molecules using a membrane channel. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[25]  Jun Sese,et al.  Genome-wide quantification of homeolog expression ratio revealed nonstochastic gene regulation in synthetic allopolyploid Arabidopsis , 2014, Nucleic acids research.

[26]  Damon P. Little,et al.  DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability , 2011, PloS one.

[27]  T. Shors,et al.  A trip down memory lane about sex differences in the brain , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[29]  Saravanaraj N. Ayyampalayam,et al.  Phylotranscriptomic analysis of the origin and early diversification of land plants , 2014, Proceedings of the National Academy of Sciences.

[30]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[31]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[32]  Mehrdad Hajibabaei,et al.  From writing to reading the encyclopedia of life , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[33]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[34]  Yaniv Erlich A vision for ubiquitous sequencing , 2015, bioRxiv.

[35]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[36]  D. Baird,et al.  A new way to contemplate Darwin's tangled bank: how DNA barcodes are reconnecting biodiversity science and biomonitoring , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[37]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[38]  R. Cruickshank,et al.  The seven deadly sins of DNA barcoding , 2012, Molecular ecology resources.

[39]  G. van den Thillart,et al.  Rapid de novo assembly of the European eel genome from nanopore sequencing reads , 2017, bioRxiv.

[40]  A. Mikheyev,et al.  A first look at the Oxford Nanopore MinION sequencer , 2014, Molecular ecology resources.

[41]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[42]  Dmitry Antipov,et al.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads , 2016, Bioinform..

[43]  W. John Kress,et al.  A DNA barcode for land plants , 2009, Proceedings of the National Academy of Sciences.