Improved bounds on the average length of longest common subsequences

It has long been known [Chvátal and Sankoff 1975] that the average length of the longest common subsequence of two random strings of length <i>n</i> over an alphabet of size <i>k</i> is asymptotic to γ<sub><i>k</i></sub><i>n</i> for some constant γ<sub><i>k</i></sub> depending on <i>k</i>. The value of these constants remains unknown, and a number of papers have proved upper and lower bounds on them. We discuss techniques, involving numerical calculations with recurrences on many variables, for determining lower and upper bounds on these constants. To our knowledge, the previous best-known lower and upper bounds for γ<sub>2</sub> were those of Dančík and Paterson, approximately 0.773911 and 0.837623 [Dančík 1994; Dančík and Paterson 1995]. We improve these to 0.788071 and 0.826280. This upper bound is less than the γ<sub>2</sub> given by Steele's old conjecture (see Steele [1997, page 3]) that γ<sub>2</sub> = 2/(1 + &sqrt;2)≈ 0.828427. (As Steele points out, experimental evidence had already suggested that this conjectured value was too high.) Finally, we show that the upper bound technique described here could be used to produce, for any <i>k</i>, a sequence of upper bounds converging to γ<sub><i>k</i></sub>, though the computation time grows very quickly as better bounds are guaranteed.

[1]  Vladimír Dancík,et al.  Expected length of longest common subsequences , 1994 .

[2]  Svante Janson,et al.  Random graphs , 2000, Wiley-Interscience series in discrete mathematics and optimization.

[3]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[4]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[5]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[6]  Alberto Apostolico,et al.  The longest common subsequence problem revisited , 1987, Algorithmica.

[7]  Gonzalo Navarro,et al.  Bounding the Expected Length of Longest Common Subsequences and Forests , 1999, Theory of Computing Systems.

[8]  Michael L. Overton,et al.  Numerical Computing with IEEE Floating Point Arithmetic , 2001 .

[9]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[10]  W. T. Gowers,et al.  RANDOM GRAPHS (Wiley Interscience Series in Discrete Mathematics and Optimization) , 2001 .

[11]  V. Chvátal,et al.  Longest common subsequences of two random sequences , 1975, Advances in Applied Probability.

[12]  David Sankoff,et al.  Longest common subsequences of two random sequences , 1975, Advances in Applied Probability.

[13]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[14]  Guy L. Steele,et al.  Java(TM) Language Specification , 2005 .

[15]  Jirí Matousek,et al.  Expected Length of the Longest Common Subsequence for Large Alphabets , 2003, LATIN.

[16]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[17]  Guy L. Steele,et al.  Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)) , 2005 .

[18]  Svante Janson,et al.  Random graphs , 2000, ZOR Methods Model. Oper. Res..

[19]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[20]  Charles R. MacCluer,et al.  The Many Proofs and Applications of Perron's Theorem , 2000, SIAM Rev..

[21]  Mike Paterson,et al.  Longest Common Subsequences , 1994, MFCS.

[22]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[23]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[24]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[25]  J. Steele Probability theory and combinatorial optimization , 1987 .

[26]  Kenneth S. Alexander,et al.  The Rate of Convergence of the Mean Length of the Longest Common Subsequence , 1994 .

[27]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[28]  Mike Paterson,et al.  Upper Bounds for the Expected Length of a Longest Common Subsequence of Two Binary Sequences , 1995, Random Struct. Algorithms.

[29]  Marcos A. Kiwi,et al.  On a Speculated Relation Between Chvátal–Sankoff Constants of Several Sequences , 2008, Combinatorics, Probability and Computing.

[30]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1994, SIAM J. Comput..

[31]  E. Seneta Non-negative Matrices and Markov Chains (Springer Series in Statistics) , 1981 .