论文信息 - Efficient de Bruijn graph construction for genome assembly using a hash table and auxiliary vector data structures

Efficient de Bruijn graph construction for genome assembly using a hash table and auxiliary vector data structures

Modern next-generation sequencing technologies can generate huge volumes of data. One popular and useful tool to analyze these huge amount of data is the so called de Bruijn graph. Because of the huge number of nodes, in de Bruijn Graph based genome assembly the main barrier is the memory and runtime. And, this area has been the focus of significant attention in the contemporary literature. We present an algorithm that makes a balance between memory and runtime. Our approach stores the de Bruijn graph in a hash table with an auxiliary data structure which improves the total memory usage and runtime with no false positives. In the whole assembly process, generally the graph construction procedure takes the major share of the time. Our approach presents significant advancement in this aspect. All the data files (in FASTA format) along with the program code are available for downloaded at the following link: https://drive.google.com/folderview?id=0B3D-hZtRZ933V1dMOVBHUkNJM00&usp=sharing.

Mahfuzer Rahman Limon | Ratul Sharker | Sajib Biswas | M. Sohel Rahman

[1] Thomas C. Conway,et al. Succinct data structures for assembling large genomes , 2010, Bioinform..

[2] Rayan Chikhi,et al. Space-efficient and exact de Bruijn graph representation based on a Bloom filter , 2012, Algorithms for Molecular Biology.

[3] E. Birney,et al. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[4] Huanming Yang,et al. De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[5] Mihai Pop,et al. Exploiting sparseness in de novo genome assembly , 2012, BMC Bioinformatics.

[6] Carl Kingsford,et al. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[7] Arend Hintze,et al. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[8] Sergey I. Nikolenko,et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[9] Steven J. M. Jones,et al. Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .