论文信息 - An Optimized Algorithm for Finding Approximate Tandem Repeats in DNA Sequences

An Optimized Algorithm for Finding Approximate Tandem Repeats in DNA Sequences

In gene analysis, finding approximate tandem repeats in DNA sequence is an important issue. MSATR is one of the latest methods for finding those repetitions, which suffers deficiencies of runtime cost and poor result quality. This paper proposes an optimized algorithm mMSATR for detecting approximate tandem repeats in genomic sequences more efficiently. By introducing the definition of CASM to reduce the searching scope and optimizing the original mechanism adopted by MSATR, mMSATR makes the detecting process more efficient and improves the result quality. The theoretical analysis and experiment results indicates that mMSATR is able to get more results within less runtime. Algorithm mMSATR is superior to other methods in finding results, and it greatly reduces the runtime cost, which is of benefit when the gene data becomes larger.

[1] J. Stoye,et al. REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[2] Franco P. Preparata,et al. Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[3] James Robertson,et al. Short tandem repeat (STR) DNA markers are hypervariable and informative in Cannabis sativa: implications for forensic investigations. , 2003, Forensic science international.

[4] N M Luscombe,et al. What is Bioinformatics? A Proposed Definition and Overview of the Field , 2001, Methods of Information in Medicine.

[5] A. Carracedo,et al. Extending STR markers in Y chromosome haplotypes , 2003, International Journal of Legal Medicine.

[6] Michael G. Main,et al. An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[7] Y. Kashi,et al. Simple sequence repeats as a source of quantitative genetic variation. , 1997, Trends in genetics : TIG.

[8] E. Nevo,et al. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review , 2002, Molecular ecology.

[9] Maxime Crochemore,et al. An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[10] G. Benson,et al. Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[11] Wang Guoren. SUA-Based Algorithm for Finding SATRs in DNA Sequence , 2007 .