De novo assembly of a new Solanum pennellii accession using nanopore sequencing

Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii . We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference S. pennellii LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference S. pennellii . Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.

[1]  Björn Usadel,et al.  Plant genome and transcriptome annotations: from misconceptions to simple solutions , 2017, Briefings Bioinform..

[2]  B. Usadel,et al.  High precision genome sequencing of engineered Gluconobacter oxydans 621H by combining long nanopore and short accurate Illumina reads. , 2017, Journal of biotechnology.

[3]  Detlef Weigel,et al.  High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell , 2018, Nature Communications.

[4]  A. Aharoni,et al.  Uncovering tomato quantitative trait loci and candidate genes for fruit cuticular lipid composition using the Solanum pennellii introgression line population , 2017, Journal of experimental botany.

[5]  Bernardo J. Clavijo,et al.  Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. , 2017, Genome research.

[6]  Y. Ruan,et al.  Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction , 2017, Nature Genetics.

[7]  Xun Xu,et al.  Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce , 2017, Nature Communications.

[8]  Korbinian Schneeberger,et al.  The impact of third generation genomic technologies on plant genome assembly. , 2017, Current opinion in plant biology.

[9]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[10]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[11]  Paolo Piazza,et al.  Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis , 2017, F1000Research.

[12]  Sanwen Huang,et al.  A chemical genetic roadmap to improved tomato flavor , 2017, Science.

[13]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[14]  Ute Roessner,et al.  The genome of Chenopodium quinoa , 2017, Nature.

[15]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[16]  E. Datema,et al.  The megabase-sized fungal genome of Rhizoctonia solani assembled from nanopore reads only , 2016, bioRxiv.

[17]  Kevin L. Childs,et al.  Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize[OPEN] , 2016, Plant Cell.

[18]  J. Clark,et al.  Using MinION nanopore sequencing to generate a de novo eukaryotic draft genome: preliminary physiological and genomic description of the extremophilic red alga Galdieria sulphuraria strain SAG 107.79 , 2016, bioRxiv.

[19]  Francesca Giordano,et al.  Oxford Nanopore MinION Sequencing and Genome Assembly , 2016, Genom. Proteom. Bioinform..

[20]  Stefan Engelen,et al.  de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer , 2016, bioRxiv.

[21]  D. Chitwood,et al.  eQTL regulating Transcript Levels Associated with Diverse Biological Processes in Tomato , 2016, bioRxiv.

[22]  A. Aharoni,et al.  Solanum pennellii backcross inbred lines (BILs) link small genomic bins with tomato traits. , 2016, The Plant journal : for cell and molecular biology.

[23]  S. Deschamps,et al.  Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens , 2016, Scientific Reports.

[24]  L. Moyle,et al.  Molecular mechanisms of postmating prezygotic reproductive isolation uncovered by transcriptome analysis , 2016, Molecular ecology.

[25]  Matthew W. Hahn,et al.  Phylogenomics Reveals Three Sources of Adaptive Variation during a Rapid Radiation , 2016, PLoS biology.

[26]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[27]  Ana Conesa,et al.  Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data , 2015, Bioinform..

[28]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[29]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[30]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[31]  Julian Parkhill,et al.  Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes , 2015, The Journal of antimicrobial chemotherapy.

[32]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[33]  Joshua Quick,et al.  Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella , 2015, Genome Biology.

[34]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[35]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[36]  Kui Lin,et al.  Genomic analyses provide insights into the history of tomato breeding , 2014, Nature Genetics.

[37]  Jun Wang,et al.  Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. , 2014, The Plant journal : for cell and molecular biology.

[38]  Takayuki Tohge,et al.  The genome of the stress-tolerant wild tomato species Solanum pennellii , 2014, Nature Genetics.

[39]  P. G. Dominguez,et al.  Natural occurring epialleles determine vitamin E accumulation in tomato fruits , 2014, Nature Communications.

[40]  Mark Stitt,et al.  Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data. , 2014, Plant, cell & environment.

[41]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[42]  Anthony M. Bolger,et al.  Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato , 2013, Proceedings of the National Academy of Sciences.

[43]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[44]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[45]  S. Zhong,et al.  Single-base resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening , 2013, Nature Biotechnology.

[46]  Daniel W. A. Buchan,et al.  The tomato genome sequence provides insights into fleshy fruit evolution , 2012, Nature.

[47]  Jeremy D. DeBarry,et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity , 2012, Nucleic acids research.

[48]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[49]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[50]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[51]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[52]  P. Schenk,et al.  Phylogenetic and transcriptional analysis of a strictosidine synthase-like gene family in Arabidopsis thaliana reveals involvement in plant defence responses. , 2009, Plant biology.

[53]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[54]  M. Freeling,et al.  How to usefully compare homologous plant genes and chromosomes as DNA sequences. , 2008, The Plant journal : for cell and molecular biology.

[55]  Z. Lippman,et al.  An integrated view of quantitative trait variation using tomato interspecific introgression lines. , 2007, Current opinion in genetics & development.

[56]  J. Molinier,et al.  SNM‐dependent recombinational repair of oxidatively induced DNA damage in Arabidopsis thaliana , 2004, EMBO reports.

[57]  S. Tanksley,et al.  Genetic variation inSolanum pennellii: Comparisons with two other sympatric tomato species , 1981, Plant Systematics and Evolution.

[58]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[59]  S. Salzberg,et al.  Using MUMmer to Identify Similar Regions in Large Sequence Sets , 2004 .

[60]  D. Zamir,et al.  An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield-associated QTL. , 1995, Genetics.