CAR: contig assembly of prokaryotic draft genomes using rearrangements

BackgroundNext generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed.ResultsIn this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage.ConclusionsCAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/and its stand-alone program can also be downloaded from the same website.

[1]  Guillaume Fertin,et al.  Combinatorics of Genome Rearrangements , 2009, Computational molecular biology.

[2]  Daniel H. Huson,et al.  OSLay: optimal syntenic layout of unfinished assemblies , 2007, Bioinform..

[3]  David Sankoff,et al.  Scaffold filling, contig fusion and comparative gene order inference , 2010, BMC Bioinformatics.

[4]  Aaron E. Darling,et al.  Reordering contigs of draft genomes using the Mauve Aligner , 2009, Bioinform..

[5]  Jens Stoye,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2009 .

[6]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[7]  Eugene W. Myers,et al.  The greedy path-merging algorithm for contig scaffolding , 2002, JACM.

[8]  Chuan Yi Tang,et al.  SoRT2: a tool for sorting genomes and reconstructing phylogenetic trees by reversals, generalized transpositions and translocations , 2010, Nucleic Acids Res..

[9]  Glenn Tesler,et al.  Efficient algorithms for multichromosomal genome rearrangements , 2002, J. Comput. Syst. Sci..

[10]  S. Salzberg,et al.  Hierarchical scaffolding with Bambus. , 2003, Genome research.

[11]  Niklas Eriksen,et al.  (1+epsilon)-Approximation of sorting by reversals and transpositions , 2001, Theor. Comput. Sci..

[12]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[13]  Alessio Mengoni,et al.  CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes , 2011, Source Code for Biology and Medicine.

[14]  Chin Lung Lu,et al.  Sorting by Reversals, Generalized Transpositions, and Translocations Using Permutation Groups , 2010, J. Comput. Biol..

[15]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[16]  Adel Dayarian,et al.  SOPRA: Scaffolding algorithm for paired reads via statistical optimization , 2010, BMC Bioinformatics.

[17]  Mathieu Blanchette,et al.  Ordering Partially Assembled Genomes Using Gene Arrangements , 2006, Comparative Genomics.

[18]  Oscar P. Kuipers,et al.  Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies , 2005, Nucleic Acids Res..

[19]  Thomas M. Keane,et al.  ABACAS: algorithm-based automatic contiguation of assembled sequences , 2009, Bioinform..

[20]  Zanoni Dias,et al.  SIS: a program to generate draft genome sequence scaffolds for prokaryotes , 2012, BMC Bioinformatics.

[21]  Kun-Tze Chen,et al.  Assembling contigs in draft genomes using reversals and block-interchanges , 2013, BMC Bioinformatics.

[22]  D. Sankoff,et al.  Parametric genome rearrangement. , 1996, Gene.