Improved Length Bounds for the Shortest Superstring Problem (Extended Abstract)

Given a collection of strings S={s1,...,s n } over an alphabet Σ, a superstring α of S is a string containing each s i as a substring; that is, for each i, 1 ≤ i ≤ n, α contains a block of ¦si¦ consecutive characters that match s i exactly. The shortest superstring problem is the problem of finding a superstring α of minimum length. This problem is NP-hard [6] and has applications in computational biology and data compression. The first O(1)-approximation algorithms were given in [2]. We describe our 2 3/4-approximation algorithm, which is the best known. While our algorithm is not complex, our analysis requires some novel machinery to describe overlapping periodic strings. We then show how to combine our result with that of [11] to obtain a ratio of 2 50/69 ≈ 2.725. We describe an implementation of our algorithm which runs in O(¦S¦+n3) time; this matches the running time of previous O(1)-approximations.

[1]  Wojciech Rytter,et al.  Parallel and Sequential Approximations of Shortest Superstrings , 1994, SWAT.

[2]  Ming Li,et al.  Towards a DNA sequencing theory (learning a string) , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[3]  David Maier,et al.  On Finding Minimal Length Superstrings , 1980, J. Comput. Syst. Sci..

[4]  Clifford Stein,et al.  Short Superstrings and the Structure of Overlapping Strings , 1995, J. Comput. Biol..

[5]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[6]  Dan Gusfield Faster Implementation of a Shortest Superstring Approximation , 1994, Inf. Process. Lett..

[8]  Gad M. Landau,et al.  An Efficient Algorithm for the All Pairs Suffix-Prefix Problem , 1992, Inf. Process. Lett..

[9]  Tao Jiang,et al.  Linear approximation of shortest superstrings , 1994, JACM.

[10]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[11]  Clifford Stein,et al.  Long tours and short superstrings , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[12]  Alan M. Frieze,et al.  On the worst-case performance of some algorithms for the asymmetric traveling salesman problem , 1982, Networks.

[13]  J. Kececioglu Exact and approximation algorithms for DNA sequence reconstruction , 1992 .

[14]  F. Frances Yao,et al.  Approximating shortest superstrings , 1997, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[15]  Hans Söderlund,et al.  Algorithms for Some String Matching Problems Arising in Molecular Genetics , 1983, IFIP Congress.

[16]  Arthur M. Lesk Computational Molecular Biology: Sources and Methods for Sequence Analysis , 1989 .

[17]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[18]  Jonathan S. Turner,et al.  Approximation Algorithms for the Shortest Common Superstring Problem , 1989, Inf. Comput..

[19]  Esko Ukkonen,et al.  A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings , 1988, Theor. Comput. Sci..

[20]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .