Finishing bacterial genome assemblies with Mix

MotivationAmong challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase.MethodsIn this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length.ResultsWe evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects.AvailabilityMix is implemented in Python and is available at https://github.com/cbib/MIX, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/.

[1]  Béla Bollobás,et al.  Random Graphs , 1985 .

[2]  Marcel J. T. Reinders,et al.  Integrating genome assemblies with MAIA , 2010, Bioinform..

[3]  Alberto Policriti,et al.  GAM: Genomic Assemblies Merger: A Graph Based Method to Integrate Different Assemblies , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[4]  Guohui Yao,et al.  Graph accordance of next-generation sequence assemblies , 2012, Bioinform..

[5]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[6]  David Hung-Chang Du,et al.  Finding the longest simple path in cyclic combinational circuits , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[7]  Fabiana M. Duarte,et al.  Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production. , 2009, Genome research.

[8]  Makedonka Mitreva,et al.  A vertebrate case study of the quality of assemblies derived from next-generation sequences , 2011, Genome Biology.

[9]  T. Wetter,et al.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. , 2004, Genome research.

[10]  Alberto Policriti,et al.  GAM-NGS: genomic assemblies merger for next generation sequencing , 2013, BMC Bioinformatics.

[11]  Douglas R. Smith,et al.  Assembly reconciliation , 2008, Bioinform..

[12]  David R. Karger,et al.  On approximating the longest path in a graph , 1997, Algorithmica.

[13]  Steven Salzberg,et al.  GAGE-B: an evaluation of genome assemblers for bacterial organisms , 2013, Bioinform..

[14]  Steven J. M. Jones,et al.  De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data , 2009, Genome Biology.

[15]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[16]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[17]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[18]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[19]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[20]  Christian L. Barrett,et al.  De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads , 2010, PloS one.