Approaches and Challenges of Next-Generation Sequence Assembly Stages

The process of sequence assembly in the next-generation environment is broken down into five stages. We introduced all these stages in Chap. 8. Here, we will discuss four of these stages in detail and present the different approaches followed in each of them. Additionally, we will debate the challenges that face each stage and their stage-specific implementation approaches. The fifth stage, the assessment of the assembly, will be discussed separately in Chap. 10.

[1]  Paul Medvedev,et al.  Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads , 2008, RECOMB.

[2]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[3]  Paul Medvedev,et al.  Maximum Likelihood Genome Assembly , 2009, J. Comput. Biol..

[4]  Lucian Ilie,et al.  HiTEC: accurate error correction in high-throughput sequencing data , 2011, Bioinform..

[5]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[6]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[7]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[8]  Esko Ukkonen,et al.  Fast scaffolding with small independent mixed integer programs , 2011, Bioinform..

[9]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[10]  Weng-Keen Wong,et al.  QSRA – a quality-value guided de novo short read assembler , 2009, BMC Bioinformatics.

[11]  Kunihiko Sadakane,et al.  Succinct de Bruijn Graphs , 2012, WABI.

[12]  Thomas C. Conway,et al.  Succinct data structures for assembling large genomes , 2010, Bioinform..

[13]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[14]  Eugene W. Myers,et al.  Computability of Models for Sequence Assembly , 2007, WABI.

[15]  S. Salzberg,et al.  Hierarchical scaffolding with Bambus. , 2003, Genome research.

[16]  Srinivas Aluru,et al.  A survey of error-correction methods for next-generation sequencing , 2013, Briefings Bioinform..

[17]  Steven J. M. Jones,et al.  De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data , 2009, Genome Biology.

[18]  Wing-Kin Sung,et al.  Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences , 2011, J. Comput. Biol..

[19]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[20]  Mihai Pop,et al.  Exploiting sparseness in de novo genome assembly , 2012, BMC Bioinformatics.

[21]  Adel Dayarian,et al.  SOPRA: Scaffolding algorithm for paired reads via statistical optimization , 2010, BMC Bioinformatics.

[22]  Sara El-Metwally,et al.  Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges , 2013, PLoS Comput. Biol..

[23]  Rayan Chikhi,et al.  Space-efficient and exact de Bruijn graph representation based on a Bloom filter , 2012, Algorithms for Molecular Biology.

[24]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[25]  Yongchao Liu,et al.  Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data , 2013, Bioinform..

[26]  Haixu Tang,et al.  Fragment assembly with short reads , 2004, Bioinform..

[27]  Jan Schröder,et al.  Genome analysis SHREC : a short-read error correction method , 2009 .

[28]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[29]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[30]  Andrew H. Chan,et al.  ECHO: a reference-free short-read error correction algorithm. , 2011, Genome research.

[31]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[32]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[34]  Steven Skiena,et al.  Crystallizing short-read assemblies around seeds , 2009, BMC Bioinformatics.

[35]  Jared T. Simpson,et al.  Efficient construction of an assembly string graph using the FM-index , 2010, Bioinform..

[36]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[37]  Paul Medvedev,et al.  Error correction of high-throughput sequencing datasets with non-uniform coverage , 2011, Bioinform..

[38]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[39]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[40]  Nilgun Donmez,et al.  SCARPA: scaffolding reads with practical algorithms , 2013, Bioinform..

[41]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[42]  Stefan Kurtz,et al.  Readjoiner: a fast and memory efficient string graph-based sequence assembler , 2012, BMC Bioinformatics.

[43]  Jan Schröder,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[44]  Leena Salmela,et al.  Correction of sequencing errors in a mixed set of reads , 2010, Bioinform..

[45]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[46]  Paul Medvedev,et al.  Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers , 2011, J. Comput. Biol..

[47]  Marcel J. T. Reinders,et al.  GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies , 2012, Bioinform..

[48]  Pavel A. Pevzner,et al.  From de Bruijn Graphs to Rectangle Graphs for Genome Assembly , 2012, WABI.

[49]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[50]  R. Durbin,et al.  Efficient de novo assembly of large genomes using compressed data structures. , 2012, Genome research.

[51]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[52]  Bertil Schmidt,et al.  A fast hybrid short read fragment assembly algorithm , 2009, Bioinform..

[53]  Eugene W. Myers,et al.  The greedy path-merging algorithm for contig scaffolding , 2002, JACM.

[54]  Gregory Kucherov,et al.  Using cascading Bloom filters to improve the memory usage for de Brujin graphs , 2013, Algorithms for Molecular Biology.

[55]  Juliane C. Dohm,et al.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. , 2007, Genome research.

[56]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[57]  Daniel R. Zerbino,et al.  Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler , 2009, PloS one.

[58]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[59]  Srinivas Aluru,et al.  Reptile: representative tiling for short read error correction , 2010, Bioinform..

[60]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[61]  Qingpeng Zhang,et al.  These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure , 2013, PloS one.

[62]  A. Gnirke,et al.  ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads , 2009, Genome Biology.