On the Use of Decision Diagrams for Finding Repetition-Free Longest Common Subsequences

We consider the repetition-free longest common subsequence (RFLCS) problem, where the goal is to find a longest sequence that appears as subsequence in two input strings and in which each character appears at most once. Our approach is to transform a RFLCS instance to an instance of the maximum independent set (MIS) problem which is subsequently solved by a mixed integer linear programming solver. To reduce the size of the underlying conflict graph of the MIS problem, a relaxed decision diagram is utilized. An experimental evaluation on two benchmark instance sets shows the advantages of the reduction of the conflict graphs in terms of shorter total computation times and the number of instances solved to proven optimality. A further advantage of the created relaxed decision diagrams is that heuristic solutions can be effectively derived. For some instances that could not be solved to proven optimality, new state-of-the-art results were obtained in this way.

[1]  Alberto Santini,et al.  Solving longest common subsequence problems via a transformation to the maximum clique problem , 2020, Comput. Oper. Res..

[2]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[3]  A. Gorbenko On the Constrained Longest Common Subsequence Problem , 2010 .

[4]  Paola Bonizzoni,et al.  Variants of constrained longest common subsequence , 2009, Inf. Process. Lett..

[5]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[6]  Carlos Eduardo Ferreira,et al.  Repetition-free longest common subsequence , 2008, Electron. Notes Discret. Math..

[7]  John N. Hooker,et al.  Decision Diagrams and Dynamic Programming , 2013, CPAIOR.

[8]  Christian Blum,et al.  A comprehensive comparison of metaheuristics for the repetition-free longest common subsequence problem , 2017, Journal of Heuristics.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[11]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[12]  Shihabur Rahman Chowdhury,et al.  Computing a Longest Common Palindromic Subsequence , 2014, Fundam. Informaticae.

[13]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[14]  Christian Blum,et al.  Construct, Merge, Solve and Adapt: Application to the Repetition-Free Longest Common Subsequence Problem , 2016, EvoCOP.

[15]  Leonardo Vanneschi,et al.  A hybrid genetic algorithm for the repetition free longest common subsequence problem , 2013, Oper. Res. Lett..

[16]  André Augusto Ciré,et al.  Multivalued Decision Diagrams for Sequencing Problems , 2013, Oper. Res..

[17]  Majid Sarrafzadeh,et al.  Area-efficient instruction set synthesis for reconfigurable system-on-chip designs , 2004, Proceedings. 41st Design Automation Conference, 2004..

[18]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[19]  André Augusto Ciré,et al.  Decision diagrams for optimization , 2018, Constraints.

[20]  Kun-Mao Chao,et al.  On the generalized constrained longest common subsequence problems , 2011, J. Comb. Optim..