New approximation algorithms for longest common subsequences

This paper focuses on finding approximations for the longest common subsequence (lcs) of two strings. Most methods which calculate an approximation for the more general problem accepting N (N/spl ges/3) input strings, give typically trivial results for the restricted case under study. Because of the small number of reliable existing heuristics, we introduce several new ones in this survey. The majority of the presented algorithms give a lower bound for the lcs. Thus they can be used, for example, as a filter to decide quickly, if a more detailed, space- and time-consuming study is needed. A lower bound can also be used to limit the search space of an exact lcs method effectively. The upper bounds complement the information about the true lcs; they form a basis to make a judgement about the reliability of a lower bound. Extensive tests have been carried out to show the strengths of the heuristics and a discussion about their role in various environments is given.

[1]  Vladimír Dancík,et al.  Expected length of longest common subsequences , 1994 .

[2]  Francis Y. L. Chin,et al.  A fast algorithm for computing longest common subsequences of small alphabet size , 1989 .

[3]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  P Argos,et al.  Protein sequence comparison: methods and significance. , 1991, Protein engineering.

[6]  Gonzalo Navarro,et al.  Bounding the Expected Length of Longest Common Subsequences and Forests , 1999, Theory of Computing Systems.

[7]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[8]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[9]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[10]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[11]  Cameron Bruce Fraser,et al.  Subsequences and Supersequences of Strings , 1995 .

[12]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[13]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[14]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.