SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme

Synthetic long reads (SLR) with long-range co-barcoding information have been recently developed and widely applied in genomics researches. We proposed a scaffolding model of the co-barcoding information and developed a scaffolding tool with adopting a top-to-bottom scheme to make full use of the complementary information in SLR datasets and a screening algorithm to reduce negative effects from misassembled contigs in an input assembly. In comparison with other available SLR scaffolding tools, our tool obtained the best quality improvement for different input assemblies, especially for those assembled by the next-generation sequencing reads, where the improvement of contiguity is about several hundred-folds.

[1]  Daniel E. Newburger,et al.  Read clouds uncover variation in complex regions of the human genome. , 2015, Genome research.

[2]  Steven J. M. Jones,et al.  LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads , 2015, GigaScience.

[3]  Mihai Pop,et al.  Modern technologies and algorithms for scaffolding assembled genomes , 2019, PLoS Comput. Biol..

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Iman Hajirasouliha,et al.  Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics. , 2019, Genome research.

[6]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[7]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[8]  Andrew C. Adey,et al.  Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing , 2014, Nature Genetics.

[9]  Jian Wang,et al.  Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly , 2019, Genome research.

[10]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[11]  Mostafa Ronaghi,et al.  Whole-genome haplotyping by dilution, amplification, and sequencing , 2013, Proceedings of the National Academy of Sciences.

[12]  Lars Arvestad,et al.  Assembly scaffolding with PE-contaminated mate-pair libraries , 2016, Bioinform..

[13]  Justin Chu,et al.  ARCS: scaffolding genome drafts with linked reads , 2017, Bioinform..

[14]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[15]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[16]  Andrew C. Adey,et al.  In vitro, long-range sequence information for de novo genome assembly via transposase contiguity , 2014, Genome research.

[17]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[18]  Olaf Sporns,et al.  A spectrum of routing strategies for brain networks , 2018, PLoS Comput. Biol..

[19]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[20]  Lu Zhang,et al.  HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data , 2017, bioRxiv.

[21]  Qiang Wang,et al.  The oyster genome reveals stress adaptation and complexity of shell formation , 2012, Nature.

[22]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[23]  S. Salzberg,et al.  Hierarchical scaffolding with Bambus. , 2003, Genome research.

[24]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[25]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[26]  Serafim Batzoglou,et al.  High-quality genome sequences of uncultured microbes by assembly of read clouds , 2018, Nature Biotechnology.

[27]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[28]  Serafim Batzoglou,et al.  Genome assembly from synthetic long read clouds , 2016, Bioinform..

[29]  Pavel A Pevzner,et al.  TruSPAdes: barcode assembly of TruSeq synthetic long reads , 2016, Nature Methods.

[30]  Pavel A Pevzner,et al.  cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs , 2019, Bioinform..

[31]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[32]  Ian T. Fiddes,et al.  Resolving the full spectrum of human genome variation using Linked-Reads , 2019, Genome research.

[33]  Dmitry Pushkarev,et al.  Whole-genome haplotyping using long reads and statistical methods , 2014, Nature Biotechnology.

[34]  Benjamin J. Raphael,et al.  Identifying structural variants using linked-read sequencing data , 2017, bioRxiv.

[35]  Justin Chu,et al.  ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers , 2018, BMC Bioinformatics.