Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing

BackgroundWhole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood.ResultsWe used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results.ConclusionsIn summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.

[1]  R. Burne,et al.  Streptococcus salivarius urease: genetic and biochemical characterization and expression in a dental plaque streptococcus , 1996, Infection and immunity.

[2]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[3]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[4]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[5]  Abhay Jere,et al.  Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data , 2013, PloS one.

[6]  Rajiv C. McCoy,et al.  Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements , 2014, bioRxiv.

[7]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[8]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[9]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[10]  Josephine A. Reinhardt,et al.  De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. , 2009, Genome research.

[11]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[12]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[13]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[14]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[15]  S. Koren,et al.  One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. , 2015, Current opinion in microbiology.

[16]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[17]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[18]  Niall J. Haslam,et al.  An analysis of the feasibility of short read sequencing , 2005, Nucleic acids research.

[19]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[20]  Daniel G. Brown,et al.  PANDAseq: paired-end assembler for illumina sequences , 2012, BMC Bioinformatics.

[21]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[22]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[23]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[24]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[25]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[26]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[27]  E. Arner,et al.  Correcting errors in shotgun sequences. , 2003, Nucleic acids research.

[28]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[29]  I. Rigoutsos,et al.  Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results , 2011, PloS one.

[30]  Alexander S. Mikheyev,et al.  A first look at the Oxford Nanopore MinION sequencer , 2014, Molecular ecology resources.

[31]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[32]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[33]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .