The efficient algorithm for mapping next generation sequencing reads to reference genome

One of the main problem related to genomics is finding similarities between different species represented by DNA sequences. The dynamic programming algorithms (Needleman-Wunsch, Smith-Waterman) give a good measure of similarity, but are not efficient for big data sets. In this study we present the new heuristic algorithm based on common parts of reads. The approach can handle all types of sequencing errors: insertions, deletions and replacements. Our algorithm result is similar to other well known tools. The presented algorithm is implemented in C++, it uses Boost libraries, it internally use threads for parallel computing. This algorithm is a part of the DNA assembler ’dnaasm’. Source code, demo application and supplementary materials are available at project homepage: http://dnaasm.sourceforge.net.