论文信息 - Computational Performance Assessment of k-mer Counting Algorithms.

Computational Performance Assessment of k-mer Counting Algorithms.

Abstract This article is about the assessment of several tools for k-mer counting, with the purpose to create a reference framework for bioinformatics researchers to identify computational requirements, parallelizing, advantages, disadvantages, and bottlenecks of each of the algorithms proposed in the tools. The k-mer counters evaluated in this article were BFCounter, DSK, Jellyfish, KAnalyze, KHMer, KMC2, MSPKmerCounter, Tallymer, and Turtle. Measured parameters were the following: RAM occupied space, processing time, parallelization, and read and write disk access. A dataset consisting of 36,504,800 reads was used corresponding to the 14th human chromosome. The assessment was performed for two k-mer lengths: 31 and 55. Obtained results were the following: pure Bloom filter-based tools and disk-partitioning techniques showed a lesser RAM use. The tools that took less execution time were the ones that used disk-partitioning techniques. The techniques that made the major parallelization were the ones that ...

PérezNelson | GutierrezMiguel | VeraNelson

[1] Qingpeng Zhang,et al. These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure , 2013, PloS one.

[2] Nir Shavit,et al. An optimistic approach to lock-free FIFO queues , 2004, Distributed Computing.

[3] Enno Ohlebusch,et al. Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[4] Fredrik Vannberg,et al. KAnalyze: a fast versatile pipelined K-mer toolkit , 2014, Bioinform..

[5] Carl Kingsford,et al. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[6] Yang Li,et al. Memory Efficient Minimum Substring Partitioning , 2013, Proc. VLDB Endow..

[7] S. Kurtz,et al. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes , 2008, BMC Genomics.

[8] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[10] Sebastian Deorowicz,et al. KMC 2: Fast and resource-frugal k-mer counting , 2014, Bioinform..

[11] Michael Roberts,et al. A Preprocessor for Shotgun Assembly of Large Genomes , 2004, J. Comput. Biol..

[12] Páll Melsted,et al. Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.

[13] Dominique Lavenier,et al. DSK: k-mer counting with very low memory usage , 2013, Bioinform..