Efficient de Bruijn graph construction for genome assembly using a hash table and auxiliary vector data structures

Modern next-generation sequencing technologies can generate huge volumes of data. One popular and useful tool to analyze these huge amount of data is the so called de Bruijn graph. Because of the huge number of nodes, in de Bruijn Graph based genome assembly the main barrier is the memory and runtime. And, this area has been the focus of significant attention in the contemporary literature. We present an algorithm that makes a balance between memory and runtime. Our approach stores the de Bruijn graph in a hash table with an auxiliary data structure which improves the total memory usage and runtime with no false positives. In the whole assembly process, generally the graph construction procedure takes the major share of the time. Our approach presents significant advancement in this aspect. All the data files (in FASTA format) along with the program code are available for downloaded at the following link: https://drive.google.com/folderview?id=0B3D-hZtRZ933V1dMOVBHUkNJM00&usp=sharing.