A Versatile Divide and Conquer Technique for Optimal String Alignment

Abstract Common string alignment algorithms such as the basic dynamic programming algorithm (DPA) and the time efficient Ukkonen algorithm use quadratic space to determine an alignment between two strings. In this paper we present a technique that can be applied to these algorithms to obtain an alignment using only linear space, while having little or no effect on the time complexity. This new technique has several advantages over previous methods for determining alignments in linear space, such as: simplicity, the ability to use essentially the same technique when using different cost functions, and the practical advantage of easily being able to trade available memory for running time.

[1]  Osamu Gotoh Pattern matching of biological sequences with limited storage , 1987, Comput. Appl. Biosci..

[2]  E. Myers,et al.  Sequence comparison with concave weighting functions. , 1988, Bulletin of mathematical biology.

[3]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[4]  Richard Hughey,et al.  Reduced space sequence alignment , 1997, Comput. Appl. Biosci..

[5]  L. Allison Normalization of affine gap costs used in optimal sequence alignment. , 1993, Journal of theoretical biology.

[6]  Yoshio Urano,et al.  Optimal alignments of biological sequences on a microcomputer , 1985, Comput. Appl. Biosci..

[7]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[8]  L Allison A fast algorithm for the optimal alignment of three strings. , 1993, Journal of theoretical biology.

[9]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[10]  Philip Taylor,et al.  A fast homology program for aligning biological sequences , 1984, Nucleic Acids Res..

[11]  S. Altschul,et al.  Optimal sequence alignment using affine gap costs. , 1986, Bulletin of mathematical biology.

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[14]  Daniel S. Hirschberg Serial computations of Levenshtein distances , 1997, Pattern Matching Algorithms.

[15]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[16]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[17]  Esko Ukkonen,et al.  On Approximate String Matching , 1983, FCT.

[18]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..