论文信息 - Improving the Worst-Case Performance of the Hunt-Szymanski Strategy for the Longest Common Subsequence of Two Strings

Improving the Worst-Case Performance of the Hunt-Szymanski Strategy for the Longest Common Subsequence of Two Strings

Abstract Among the algorithms set up to date for finding the longest common subsequence of two strings, the one by Hunt and Szymanski exhibits the best known performance in favorable cases, but can be worse than any straightforward algorithm for a large variety of inputs. The new algorithm presented here pursues a schedule of primitive operations quite close to the one inherent to the Hunt-Szymanski strategy, but with substantially enhanced efficiency. In fact, the new algorithm improves on the former in two important respects. First, its worst case is never worse than linear in the product nm of the lengths of the two input strings. Second, its time bound does not always grow with the cardinality r of the set R of all pairs of matching positions of the input strings. Rather, it depends on the cardinality d of a specific subset of R, whose elements are called here dominant matches , and are elsewhere referred to as minimal candidates . This second improvement also appears of significance, since it seems that whenever r gets too close to mn, this forces d to be linear in m. The new algorithm requires standard preprocessing, and makes use of finger-trees. In a forthcoming paper, it will be shown among other things that the same performance can be achieved with simpler and handier auxiliary data structures.

Alberto Apostolico

[1] Alfred V. Aho,et al. Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[2] Thomas G. Szymanski,et al. A fast algorithm for computing longest common subsequences , 1977, CACM.

[3] Robert E. Tarjan,et al. A Fast Merging Algorithm , 1979, JACM.

[4] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[5] Daniel S. Hirschberg,et al. Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[6] Robert E. Tarjan,et al. A representation for linear lists with movable fingers , 1978, STOC.

[7] Peter van Emde Boas,et al. Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[8] Kurt Mehlhorn,et al. Data Structures and Algorithms 1: Sorting and Searching , 2011, EATCS Monographs on Theoretical Computer Science.