CSAR: a contig scaffolding tool using algebraic rearrangements

Summary Advances in next generation sequencing have generated massive amounts of short reads. However, assembling genome sequences from short reads still remains a challenging task. Due to errors in reads and large repeats in the genome, many of current assembly tools usually produce just collections of contigs whose relative positions and orientations along the genome being sequenced are still unknown. To address this issue, a scaffolding process to order and orient the contigs of a draft genome is needed for completing the genome sequence. In this work, we propose a new scaffolding tool called CSAR that can efficiently and more accurately order and orient the contigs of a given draft genome based on a reference genome of a related organism. In particular, the reference genome required by CSAR is not necessary to be complete in sequence. Our experimental results on real datasets have shown that CSAR outperforms other similar tools such as Projector2, OSLay and Mauve Aligner in terms of average sensitivity, precision, F-score, genome coverage, NGA50 and running time. Availability and implementation The program of CSAR can be downloaded from https://github.com/ablab-nthu/CSAR. Contact hchiu@mail.ncku.edu.tw or cllu@cs.nthu.edu.tw. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Chin Lung Lu An Efficient Algorithm for the Contig Ordering Problem under Algebraic Rearrangement Distance , 2015, J. Comput. Biol..

[2]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[3]  Aaron E. Darling,et al.  Reordering contigs of draft genomes using the Mauve Aligner , 2009, Bioinform..

[4]  Daniel H. Huson,et al.  OSLay: optimal syntenic layout of unfinished assemblies , 2007, Bioinform..

[5]  Kun-Tze Chen,et al.  CAR: contig assembly of prokaryotic draft genomes using rearrangements , 2014, BMC Bioinformatics.

[6]  João Meidanis,et al.  Extending the Algebraic Formalism for Genome Rearrangements to Include Linear Chromosomes , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Oscar P. Kuipers,et al.  Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies , 2005, Nucleic Acids Res..

[8]  M. Berriman,et al.  A comprehensive evaluation of assembly scaffolding tools , 2014, Genome Biology.

[9]  Chin Lung Lu,et al.  Sorting by Reversals, Generalized Transpositions, and Translocations Using Permutation Groups , 2010, J. Comput. Biol..

[10]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[11]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[12]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..