Next generation sequencing under de novo genome assembly

The next generation sequencing (NGS) is an important process which assures inexpensive organization of vast size of raw sequence dataset over any traditional sequencing systems or methods. Various aspects of NGS such as template preparation, sequencing imaging and genome alignment and assembly outline the genome sequencing and alignment. Consequently, de Bruijn graph (dBG) is an important mathematical tool that graphically analyzes how the orientations are constructed in groups of nucleotides. Basically, dBG describes the formation of the genome segments in circular iterative fashions. Some pivotal dBG-based de novo algorithms and software packages such as T-IDBA, Oases, IDBA-tran, Euler, Velvet, ABySS, AllPaths, SOAPde novo and SOAPde novo2 are illustrated in this paper. Consequently, overlap layout consensus (OLC) graph-based algorithms also play vital role in NGS assembly. Some important OLC-based algorithms such as MIRA3, CABOG, Newbler, Edena, Mosaik and SHORTY are portrayed in this paper. It has been experimented that greedy graph-based algorithms and software packages are also vital for proper genome dataset assembly. A few algorithms named SSAKE, SHARCGS and VCAKE help to perform proper genome sequencing.

[1]  Mihai Pop,et al.  Exploiting sparseness in de novo genome assembly , 2012, BMC Bioinformatics.

[2]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[3]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[5]  Lars Bolund,et al.  Building the sequence map of the human pan-genome , 2010, Nature Biotechnology.

[6]  Juliane C. Dohm,et al.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. , 2007, Genome research.

[7]  G. Weinstock,et al.  The Atlas genome assembly system. , 2004, Genome research.

[8]  S. Moore,et al.  Short reads, circular genome: skimming solid sequence to construct the bighorn sheep mitochondrial genome. , 2012, The Journal of heredity.

[9]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[10]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[11]  Steven Skiena,et al.  Crystallizing short-read assemblies around seeds , 2009, BMC Bioinformatics.

[12]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[13]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[14]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[15]  Daniel R. Zerbino,et al.  Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler , 2009, PloS one.

[16]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[17]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[18]  M. Tomita,et al.  Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics , 2012, Genes to cells : devoted to molecular & cellular mechanisms.

[19]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[20]  Matthew B. Kerby,et al.  Landscape of next-generation sequencing technologies. , 2011, Analytical chemistry.

[21]  Jian Ye,et al.  Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction , 2012, BMC Bioinformatics.

[22]  Niall J. Haslam,et al.  An analysis of the feasibility of short read sequencing , 2005, Nucleic acids research.

[23]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[24]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[25]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[26]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[27]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[28]  K. Voelkerding,et al.  Next-generation sequencing: from basic research to diagnostics. , 2009, Clinical chemistry.

[29]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[30]  Yude Yu,et al.  The next-generation sequencing technology and application , 2010, Protein & Cell.

[31]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[32]  Siu-Ming Yiu,et al.  IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels , 2013, Bioinform..

[33]  B. Wold,et al.  Sequence census methods for functional genomics , 2008, Nature Methods.

[34]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[35]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[36]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[37]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[38]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[39]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[40]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[41]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.