Chromosome‐level hybrid de novo genome assemblies as an attainable option for nonmodel insects

The emergence of third‐generation sequencing (3GS; long‐reads) is bringing closer the goal of chromosome‐size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of nonmodel organisms. However, long‐read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short‐reads and long‐reads, provide an alternative efficient and cost‐effective approach to generate de novo, chromosome‐level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation are constantly being expanded and improved. This makes it difficult for nonexperts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of nonmodel organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a nonmodel cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this nonmodel organism using the dbg2olc pipeline.

[1]  W. Heed Ecology and Genetics of Sonoran Desert Drosophila , 1978 .

[2]  A. Ruíz,et al.  Evolution of the mojavensis cluster of cactophilic Drosophila with descriptions of two new species. , 1990, The Journal of heredity.

[3]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[4]  Benjamin G. Bitler,et al.  Functional genomics of cactus host shifts in Drosophila mojavensis , 2006, Molecular ecology.

[5]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[6]  Josh Goodman,et al.  Polytene chromosomal maps of 11 Drosophila species: the order of genomic scaffolds inferred from genetic and physical maps. , 2008, Genetics.

[7]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[8]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[9]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[10]  L. Matzkin,et al.  Postmating transcriptional changes in reproductive tracts of con- and heterospecifically mated Drosophila mojavensis females , 2011, Proceedings of the National Academy of Sciences.

[11]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[12]  Mihai Pop,et al.  Exploiting sparseness in de novo genome assembly , 2012, BMC Bioinformatics.

[13]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[14]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[15]  L. Matzkin,et al.  Ecological genomics of host shifts in Drosophila mojavensis. , 2014, Advances in experimental medicine and biology.

[16]  Adam M Phillippy,et al.  Long-read, whole-genome shotgun sequence data for five model organisms , 2014, Scientific Data.

[17]  Tetsuya Hayashi,et al.  Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads , 2014, Genome research.

[18]  Leena Salmela,et al.  LoRDEC: accurate and efficient long read error correction , 2014, Bioinform..

[19]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[20]  H. Ellegren Genome sequencing and population genomics in non-model organisms. , 2014, Trends in ecology & evolution.

[21]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[22]  Charles E. Lawrence,et al.  Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION , 2015, bioRxiv.

[23]  Chengxi Ye,et al.  Distributed under Creative Commons Cc-by 4.0 Sparc: a Sparsity-based Consensus Algorithm for Long Erroneous Sequencing Reads , 2022 .

[24]  Gabriel Goldstein,et al.  Improved assembly of noisy long reads by k-mer validation , 2016, bioRxiv.

[25]  Ruifeng Hu,et al.  LSCplus: a fast solution for improving long read accuracy by short read alignment , 2016, BMC Bioinformatics.

[26]  James G. Baldwin-Brown,et al.  Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage , 2016, bioRxiv.

[27]  Chengxi Ye,et al.  DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies , 2014, Scientific Reports.

[28]  Piet Demeester,et al.  Jabba: hybrid error correction for long sequencing reads , 2015, Algorithms for Molecular Biology.

[29]  Leonard McMillan,et al.  Improved long read correction for de novo assembly using an FM-index , 2016, bioRxiv.

[30]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[31]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[32]  Ergude Bao,et al.  HALC: High throughput algorithm for long read error correction , 2017, BMC Bioinformatics.

[33]  Feng Luo,et al.  MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads , 2017, Nature Methods.

[34]  Andrew P Hendry,et al.  What genomic data can reveal about eco-evolutionary dynamics , 2017, Nature Ecology & Evolution.

[35]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[36]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[37]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[38]  L. Matzkin,et al.  Behavioral evolution accompanying host shifts in cactophilic Drosophila larvae , 2018, Ecology and evolution.

[39]  Julia Zeitlinger,et al.  Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing , 2018, G3: Genes, Genomes, Genetics.

[40]  Genomic analysis of the four ecologically distinct cactus host populations of Drosophila mojavensis , 2019, BMC Genomics.

[41]  Carson W. Allan,et al.  Genomic analysis of the four ecologically distinct cactus host populations of Drosophila mojavensis , 2019, BMC Genomics.

[42]  K. Holt,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[43]  L. Matzkin,et al.  Assessing the Architecture of Drosophila mojavensis Locomotor Evolution with Bulk Segregant Analysis , 2019, G3: Genes, Genomes, Genetics.

[44]  Giulia Antonazzo,et al.  FlyBase 2.0: the next generation , 2018, Nucleic Acids Res..