A fast and practical bit-vector algorithm for the Longest Common Subsequence problem

Abstract This paper presents a new practical bit-vector algorithm for solving the well-known Longest Common Subsequence (LCS) problem. Given two strings of length m and n , n ⩾ m , we present an algorithm which determines the length p of an LCS in O( nm / w ) time and O( m / w ) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise “parallelization” of the classical dynamic programming approach. Our algorithm is very efficient in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that m ⩽ w .

[1]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[2]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[3]  David Sankoff,et al.  Shortcuts, diversions, and maximal chainsin partially ordered sets , 1973, Discret. Math..

[4]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[5]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[6]  E. Reingold,et al.  Combinatorial Algorithms: Theory and Practice , 1977 .

[7]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[8]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[9]  Kenneth P. Bogart,et al.  Introductory Combinatorics , 1977 .

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[12]  Alden H. Wright Approximate string matching using withinword parallelism , 1994, Softw. Pract. Exp..

[13]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[14]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[15]  Daniel S. Hirschberg,et al.  An Information-Theoretic Lower Bound for the Longest Common Subsequence Problem , 1977, Inf. Process. Lett..