Tandem repeats analysis in DNA sequences based on improved Burrows-Wheeler transform

The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including Mapping and Assembly with Quality (MAQ), which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Therefore, we carried out an in-depth performance analysis of BWA a popular BWT-based aligner and discovered that its performance is significantly better than MAQ although, it has drawbacks regarding execution speed, time complexity and accuracy. Based on those factors we implemented an improved Burrows-Wheeler Alignment algorithm (BWA), anew read alignment package which is original BWT optimized by source code of Ziv-Lempel (LZ-77) sliding window technique and prefix trie string matching, to efficiently search for inexact and exact matches on tandem repeats against a large reference sequence genome. Our analysis show that search speed of improved BWA significantly increased by approximately 1.40 ×faster than MAQ-32 while achieving sufficiently higher accuracy with percent confidence of 96.7 % and 93.0 %. Moreover, it is more efficient to search exact and inexact matches supported by percent error of 0.05 % single ends and 0.04 % for paired end reads also more effective to search for left and right overlap tandem repeat at percent confidence of 88.9%.