论文信息 - Readjoiner: a fast and memory efficient string graph-based sequence assembler

Readjoiner: a fast and memory efficient string graph-based sequence assembler

BackgroundOngoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads.ResultsHere we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only.ConclusionsOur suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.

Stefan Kurtz | Giorgio Gonnella | S. Kurtz | G. Gonnella

[1] Edward Fredkin,et al. Trie memory , 1960, Commun. ACM.

[2] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[3] Gad M. Landau,et al. An Efficient Algorithm for the All Pairs Suffix-Prefix Problem , 1992, Inf. Process. Lett..

[4] Jon Louis Bentley,et al. Engineering a sort function , 1993, Softw. Pract. Exp..

[5] Eugene W. Myers,et al. Toward Simplifying and Accurately Formulating Fragment Assembly , 1995, J. Comput. Biol..

[6] Roberto Grossi,et al. The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[7] P. Pevzner,et al. An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8] S. Salzberg,et al. Versatile and open software for comparing large genomes , 2004, Genome Biology.

[9] S. Salzberg,et al. Hierarchical scaffolding with Bambus. , 2003, Genome research.

[10] Enno Ohlebusch,et al. Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[11] Eugene W. Myers,et al. The fragment assembly string graph , 2005, ECCB/JBI.

[12] E. Birney,et al. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[13] Mark J. P. Chaisson,et al. Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[14] David Hernández,et al. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[15] Juha Kärkkäinen,et al. Engineering Radix Sort for Strings , 2008, SPIRE.

[16] John D McPherson,et al. Next-generation gap , 2009, Nature Methods.

[17] Steven J. M. Jones,et al. Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[18] Jared T. Simpson,et al. Efficient construction of an assembly string graph using the FM-index , 2010, Bioinform..

[19] David R. Kelley,et al. Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.