An enhanced beam search algorithm for the Shortest Common Supersequence Problem

Abstract The Shortest Common Supersequence Problem asks to obtain a shortest string that is a supersequence of every member of a given set of strings. It has applications, among others, in data compression and oligonucleotide microarray production. The problem is NP-hard, and the existing exact solutions are impractical for large instances. In this paper, a new beam search algorithm is proposed for the problem, which employs a probabilistic heuristic and uses the dominance property to further prune the search space. The proposed algorithm is compared with three recent algorithms proposed for the problem on both random and biological sequences, outperforming them all by quickly providing solutions of higher average quality in all the experimental cases. The Java source and binary files of the proposed IBS_SCS algorithm and our implementation of the DR algorithm and all the random and real datasets used in this paper are freely available upon request.

[1]  Sayyed Rasoul Mousavi,et al.  An improved algorithm for the longest common subsequence problem , 2012, Comput. Oper. Res..

[2]  Todd Easton,et al.  A Specialized Branching and Fathoming Technique for The Longest Common Subsequence Problem , 2007 .

[3]  Jirí Kubalík Efficient stochastic local search algorithm for solving the shortest common supersequence problem , 2010, GECCO '10.

[4]  V. G. Timkovskii Complexity of common subsequence and supersequence problems and related problems , 1989 .

[5]  Martin Middendorf,et al.  Searching for Shortest Common Supersequences by Means of a Heuristic-Based Genetic Algorithm , 1996 .

[6]  H. Leong,et al.  A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays , 2005, Nucleic acids research.

[7]  G. Dullerud A Computational Framework , 1996 .

[8]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[9]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1994, SIAM J. Comput..

[10]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[11]  Sven Rahmann The shortest common supersequence problem in a microarray production setting , 2003, ECCB.

[12]  Carlos Cotta,et al.  On the Hybridization of Memetic Algorithms With Branch-and-Bound Techniques , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Torbjørn Rognes,et al.  A universal assay for detection of oncogenic fusion transcripts by oligo microarray analysis , 2009, Molecular Cancer.

[14]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[15]  Cameron Bruce Fraser,et al.  Subsequences and Supersequences of Strings , 1995 .

[16]  Jürgen Branke,et al.  Improved heuristics and a genetic algorithm for finding short supersequences , 1998 .

[17]  Manuel López-Ibáñez,et al.  Beam search for the longest common subsequence problem , 2009, Comput. Oper. Res..

[18]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[19]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[20]  Qiang Yang,et al.  Theory and Algorithms for Plan Merging , 1992, Artif. Intell..

[21]  Paola Bonizzoni,et al.  An approximation algorithm for the shortest common supersequence problem: an experimental analysis , 2001, SAC.

[22]  Robert W. Irving,et al.  On the Worst-Case Behaviour of Some Approximation Algorithms for the Shortest Common Supersequence of k Strings , 1993, CPM.

[23]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[24]  Martin Middendorf,et al.  An ACO algorithm for the shortest common supersequence problem , 1999 .

[25]  Martin Middendorf,et al.  An Island Model Based Ant System with Lookahead for the Shortest Supersequence Problem , 1998, PPSN.

[26]  Gianpaolo Oriolo,et al.  An approximate A* algorithm and its application to the SCS problem , 2003, Theor. Comput. Sci..

[27]  Carlos Cotta,et al.  A Probabilistic Beam Search Approach to the Shortest Common Supersequence Problem , 2007, EvoCOP.

[28]  Hon Wai Leong,et al.  Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[29]  Z. Weng,et al.  A computational framework for optimal masking in the synthesis of oligonucleotide microarrays. , 2002, Nucleic acids research.

[30]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .