Shortest Path Approaches for the Longest Common Subsequence of a Set of Strings

We investigate the k-LCS problem that is finding a longest common subsequence (LCS) for k given input strings. The problem is known to have practical solutions for k = 2, but for higher dimensions it is not very well explored. We consider the algorithms by Miller and Myers as well as Wu et al. which solve the 2-LCS problem, and shed a new light on their generalization to higher dimensions. First, we redesign both algorithms such that the generalization to higher dimensions becomes natural. Then we present our algorithms for solving the k-LCS problem. We further propose a new approach to reduce the algorithms' space complexity. We demonstrate that our algorithms are practical as they significantly outperform the dynamic programming approaches. Our results stand in contrast to observations made in previous work by Irving and Fraser.

[1]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[2]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[3]  Fred R. McMorris,et al.  The computation of consensus patterns in DNA sequences , 1993 .

[4]  Mourad Elloumi,et al.  Comparison of Strings Belonging to the Same Family , 1998, Inf. Sci..

[5]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[6]  Stephen Y. Itoga The string merging problem , 1981, BIT.

[7]  Robert W. Irving,et al.  Two Algorithms for the Longest Common Subsequence of Three (or More) Strings , 1992, CPM.

[8]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[9]  Thomas B. Martin,et al.  Automatic Speech and Speaker Recognition , 1979 .

[10]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[11]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[12]  Sergey Bereg,et al.  RNA multiple structural alignment with longest common subsequences , 2005, J. Comb. Optim..

[13]  Hiroshi Imai,et al.  The Longest Common Subsequence Problem for Small Alphabet Size Between Many Strings , 1992, ISAAC.

[14]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[15]  Ricardo A. Baeza-Yates,et al.  Searching Subsequences , 1991, Theor. Comput. Sci..

[16]  James C. French,et al.  Applications of approximate word matching in information retrieval , 1997, CIKM '97.

[17]  Jeffrey Scott Vitter,et al.  Shortest paths in euclidean graphs , 2005, Algorithmica.

[18]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[19]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[20]  Hon Wai Leong,et al.  Finding Patterns in Biological Sequences by Longest Common Subsequencesand Shortest Common Supersequences , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[21]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[22]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[23]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[24]  G. R. Cross,et al.  An improved algorithm to find the length of the longest common subsequence of two strings , 1989, SIGF.

[25]  M. W. Du,et al.  Computing a longest common subsequence for a set of strings , 1984, BIT.