Approximating Shortest Superstrings

The shortest-superstring problem is to find a shortest possible string that contains every string in a given set as substrings. This problem has applications to data compression and DNA sequencing. Since the problem is NP-hard and MAX SNP-hard, approximation algorithms are of interest. We present a new algorithm which always finds a superstring that is at most 2.89 times as long as the shortest superstring. Our result improves the $3$-approximation result of Blum et al.

[1]  T. Gingeras,et al.  Computer programs for the assembly of DNA sequences. , 1979, Nucleic acids research.

[2]  Tao Jiang,et al.  Linear approximation of shortest superstrings , 1994, JACM.

[3]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[4]  Jonathan S. Turner,et al.  Approximation Algorithms for the Shortest Common Superstring Problem , 1989, Inf. Comput..

[5]  E. B. James,et al.  Information Compression by Factorising Common Strings , 1975, Computer/law journal.

[6]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[7]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[8]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[9]  Marvin B. Shapiro An Algorithm for Reconstructing Protein and RNA Sequences , 1967, JACM.

[10]  Arthur M. Lesk Computational Molecular Biology: Sources and Methods for Sequence Analysis , 1989 .

[11]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[12]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[13]  Esko Ukkonen,et al.  A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings , 1988, Theor. Comput. Sci..

[14]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[15]  Hans Söderlund,et al.  Algorithms for Some String Matching Problems Arising in Molecular Genetics , 1983, IFIP Congress.

[16]  David Maier,et al.  On Finding Minimal Length Superstrings , 1980, J. Comput. Syst. Sci..