An Optimized Algorithm for Finding Approximate Tandem Repeats in DNA Sequences

In gene analysis, finding approximate tandem repeats in DNA sequence is an important issue. MSATR is one of the latest methods for finding those repetitions, which suffers deficiencies of runtime cost and poor result quality. This paper proposes an optimized algorithm mMSATR for detecting approximate tandem repeats in genomic sequences more efficiently. By introducing the definition of CASM to reduce the searching scope and optimizing the original mechanism adopted by MSATR, mMSATR makes the detecting process more efficient and improves the result quality. The theoretical analysis and experiment results indicates that mMSATR is able to get more results within less runtime. Algorithm mMSATR is superior to other methods in finding results, and it greatly reduces the runtime cost, which is of benefit when the gene data becomes larger.