Next-Generation Sequence Assembly Overview

Next-generation sequence assembly can be viewed as a five-stage process of data processing and computational challenges. These stages are error correction, graph construction, graph simplification, scaffolding, and the assembly assessment stage. These stages communicate with each other to produce the final assembled sequences. Each stage receives a set of inputs from the preceding one and passes its output to the following stage. In this chapter, we will briefly introduce the basic functions of each stage and provide a coherent framework of the communications that occur between them.

[1]  Eugene W. Myers,et al.  Comparing Assemblies Using Fragments and Mate-Pairs , 2001, WABI.

[2]  Nilgun Donmez,et al.  SCARPA: scaffolding reads with practical algorithms , 2013, Bioinform..

[3]  Torsten Seemann,et al.  VAGUE: a graphical user interface for the Velvet assembler , 2013, Bioinform..

[4]  Kamil Khanipov,et al.  Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation , 2012, BMC Bioinformatics.

[5]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[6]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[7]  Srinivas Aluru,et al.  Reptile: representative tiling for short read error correction , 2010, Bioinform..

[8]  Adel Dayarian,et al.  SOPRA: Scaffolding algorithm for paired reads via statistical optimization , 2010, BMC Bioinformatics.

[9]  Esko Ukkonen,et al.  Fast scaffolding with small independent mixed integer programs , 2011, Bioinform..

[10]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[11]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[12]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[13]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[14]  Hui Shen,et al.  Comparative studies of de novo assembly tools for next-generation sequencing technologies , 2011, Bioinform..

[15]  Wing-Kin Sung,et al.  Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences , 2011, J. Comput. Biol..

[16]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[17]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[18]  Paul Medvedev,et al.  Error correction of high-throughput sequencing datasets with non-uniform coverage , 2011, Bioinform..

[19]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[20]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[21]  Jan Schröder,et al.  Genome analysis SHREC : a short-read error correction method , 2009 .

[22]  Andrew H. Chan,et al.  ECHO: a reference-free short-read error correction algorithm. , 2011, Genome research.

[23]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[24]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[25]  Marcel J. T. Reinders,et al.  GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies , 2012, Bioinform..

[26]  Srinivas Aluru,et al.  A survey of error-correction methods for next-generation sequencing , 2013, Briefings Bioinform..

[27]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[28]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[29]  Jan Schröder,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[30]  Eugene W. Myers,et al.  Computability of Models for Sequence Assembly , 2007, WABI.

[31]  Todd H. Oakley,et al.  The Ecoresponsive Genome of Daphnia pulex , 2011, Science.

[32]  Paul Medvedev,et al.  Maximum Likelihood Genome Assembly , 2009, J. Comput. Biol..

[33]  Lucian Ilie,et al.  HiTEC: accurate error correction in high-throughput sequencing data , 2011, Bioinform..

[34]  Sara El-Metwally,et al.  Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges , 2013, PLoS Comput. Biol..

[35]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[36]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[37]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[38]  Yufeng Shen,et al.  Bos taurus genome assembly , 2009, BMC Genomics.

[39]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.