论文信息 - Efficient algorithms for the longest common subsequence in k-length substrings

Efficient algorithms for the longest common subsequence in k-length substrings

Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of the major concepts known from LCS research (dynamic programming, sparse dynamic programming, tabulation). Our algorithms make use of a linear-time and linear-space preprocessing finding the occurrences of all the substrings of length k from one sequence in the other sequence.

Szymon Grabowski | Sebastian Deorowicz

[1] Peter Sanders,et al. Linear work suffix array construction , 2006, JACM.

[2] Gary Benson,et al. Longest Common Subsequence in k Length Substrings , 2013, SISAP.

[3] Mike Paterson,et al. A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[4] Michael A. Bender,et al. The LCA Problem Revisited , 2000, LATIN.

[5] M. Crochemore,et al. Algorithms on Strings: Tools , 2007 .

[6] Hiroki Arimura,et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[7] Thomas G. Szymanski,et al. A fast algorithm for computing longest common subsequences , 1977, CACM.

[8] Robert E. Tarjan,et al. Making Data Structures Persistent , 1989, J. Comput. Syst. Sci..

[9] Maxime Crochemore,et al. Algorithms on strings , 2007 .

[10] David Eppstein,et al. Sparse dynamic programming I: linear cost functions , 1992, JACM.

[11] Peter van Emde Boas,et al. Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[12] Alberto Apostolico,et al. The longest common subsequence problem revisited , 1987, Algorithmica.