New Algorithms for the LCS Problem

The LCS problem is to determine a longest common subsequence (LCS) of two symbol sequences. Two algorithms which improve two existing results, respectively, are presented. Let m, n be the lengths of the two input strings, with m Q n, p being the length of the LCS, and s being the number of distinct symbols appearing in the two strings. It is shown that the first algorithm presented requires at most O(n log s) preprocessing time and O@m log(n/m) + pm) processing time to solve the problem. This bound is better than that of previous algorithms especially when n is much greater than m. The algorithm also exhibits desirable properties under conditions of sparse matches. The second scheme achieves essentially the same bound (O@m log(n/p) + pm)) by employing efficient merging methods in the computations. It also outperforms existing algorithms designed for sparsely-matched situations. Together, the two algorithms provide interesting contrasts of different approaches to one problem; they also

[1]  Frank K. Hwang,et al.  A Simple Algorithm for Merging Two Disjoint Linearly-Ordered Sets , 1972, SIAM J. Comput..

[2]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[3]  R. Faure,et al.  Introduction to operations research , 1968 .

[4]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[5]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[6]  King-Sun Fu,et al.  A Sentence-to-Sentence Clustering Procedure for Pattern Analysis , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Stephen Y. Itoga The string merging problem , 1981, BIT.

[8]  D. Knuth,et al.  Selected combinatorial research problems. , 1972 .

[9]  V. Chvátal,et al.  Longest common subsequences of two random sequences , 1975, Advances in Applied Probability.

[10]  Brian W. Kernighan,et al.  A system for typesetting mathematics , 1975, Commun. ACM.

[11]  Bharat K. Bhargava,et al.  Tree Systems for Syntactic Pattern Recognition , 1973, IEEE Transactions on Computers.

[12]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[13]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[14]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[15]  M O Dayhoff Computer aids to protein sequence determination. , 1965, Journal of theoretical biology.

[16]  D Sankoff,et al.  A test for nucleotide sequence homology. , 1973, Journal of molecular biology.

[17]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[18]  Howard Lee Morgan,et al.  Spelling correction in systems programs , 1970, Commun. ACM.

[19]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[20]  M. O. Dayhoff Computer analysis of protein evolution. , 1969, Scientific American.

[21]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[22]  Robert A. Wagner,et al.  Common phrases and minimum-space text storage , 1973, CACM.

[23]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[24]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[25]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[26]  Robert A. Wagner,et al.  On the complexity of the Extended String-to-String Correction Problem , 1975, STOC.

[27]  David Maier,et al.  On Finding Minimal Length Superstrings , 1980, J. Comput. Syst. Sci..

[28]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[29]  Amar Mukhopadhyay A fast algorithm for the longest-common-subsequence problem , 1980, Inf. Sci..

[30]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[31]  Glenn K. Manacher,et al.  Significant Improvements to the Hwang-Lin Merging Algorithm , 1979, JACM.

[32]  Michael L. Fredman,et al.  On computing the length of longest increasing subsequences , 1975, Discret. Math..

[33]  Robert E. Tarjan,et al.  A Fast Merging Algorithm , 1979, JACM.

[34]  Peter H. Sellers,et al.  An Algorithm for the Distance Between Two Finite Sequences , 1974, J. Comb. Theory, Ser. A.

[35]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[36]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[37]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[38]  Donald E. Knuth,et al.  Big Omicron and big Omega and big Theta , 1976, SIGA.

[39]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[40]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[41]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.