Anytime algorithms for the longest common palindromic subsequence problem

Abstract The longest common palindromic subsequence (LCPS) problem aims at finding a longest string that appears as a subsequence in each of a set of input strings and is a palindrome at the same time. The problem is a special variant of the well known longest common subsequence problem and has applications in particular in genomics and biology, where strings correspond to DNA or protein sequences and similarities among them shall be detected or quantified. We first present a more traditional A* search that makes use of an advanced upper bound calculation for partial solutions. This exact approach works well for instances with two input strings and, as shown in experiments, outperforms several other exact methods from the literature. However, the A* search also has natural limitations when a larger number of strings shall be considered due to the problem’s complexity. To effectively deal with this case in practice, anytime A* search variants are investigated, which are able to return a reasonable heuristic solution at almost any time and are expected to find better and better solutions until reaching a proven optimum when enough time given. In particular a novel approach is proposed in which Anytime Column Search (ACS) is interleaved with traditional A* node expansions. The ACS iterations are guided by a new heuristic function that approximates the expected length of an LCPS in subproblems usually much better than the available upper bound calculation. This A*+ACS hybrid is able to solve small to medium-sized LCPS instances to proven optimality while returning good heuristic solutions together with upper bounds for large instances. In rigorous experimental evaluations we compare A*+ACS to several other anytime A* search variants and observe its superiority.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Paola Bonizzoni,et al.  Experimenting an approximation algorithm for the LCS , 2001, Discret. Appl. Math..

[3]  Sheldon H. Jacobson,et al.  A branch, bound, and remember algorithm for the 1|ri|∑ti scheduling problem , 2009, J. Sched..

[4]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  M. Giel-Pietraszuk,et al.  Palindromes in Proteins , 2003, Journal of protein chemistry.

[6]  Sergej Znamenskij Approximation of the longest common subsequence length for two long random strings , 2016 .

[7]  Majid Sarrafzadeh,et al.  Area-efficient instruction set synthesis for reconfigurable system-on-chip designs , 2004, Proceedings. 41st Design Automation Conference, 2004..

[8]  Borja Calvo,et al.  scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems , 2016, R J..

[9]  Shyong Jian Shyu,et al.  Finding the longest common subsequence for multiple biological sequences by ant colony optimization , 2009, Comput. Oper. Res..

[10]  Costas S. Iliopoulos,et al.  Finite automata based algorithms on subsequences and supersequences of degenerate strings , 2010, J. Discrete Algorithms.

[11]  Todd Easton,et al.  A large neighborhood search heuristic for the longest common subsequence problem , 2008, J. Heuristics.

[12]  Satya Gautam Vadlamudi,et al.  Anytime Column Search , 2012, Australasian Conference on Artificial Intelligence.

[13]  Weixiong Zhang,et al.  Complete Anytime Beam Search , 1998, AAAI/IAAI.

[14]  Gad M. Landau,et al.  Restricted LCS , 2010, SPIRE.

[15]  Eric A. Hansen,et al.  Anytime Heuristic Search , 2011, J. Artif. Intell. Res..

[16]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[17]  V. Popov The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2011 .

[18]  S. Larionov,et al.  Chromosome evolution with naked eye: palindromic context of the life origin. , 2008, Chaos.

[19]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[20]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[21]  Satya Gautam Vadlamudi,et al.  Anytime pack search , 2015, Natural Computing.

[22]  Christian Blum,et al.  A Hybrid Metaheuristic for the Longest Common Subsequence Problem , 2010, Hybrid Metaheuristics.

[23]  Sebastian Thrun,et al.  ARA*: Anytime A* with Provable Bounds on Sub-Optimality , 2003, NIPS.

[24]  Shunsuke Inenaga,et al.  A hardness result and new algorithm for the longest common palindromic subsequence problem , 2016, Inf. Process. Lett..

[25]  Manuel López-Ibáñez,et al.  Beam search for the longest common subsequence problem , 2009, Comput. Oper. Res..

[26]  P. P. Chakrabarti,et al.  AWA* - A Window Constrained Anytime Heuristic Search Algorithm , 2007, IJCAI.

[27]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[28]  Satya Gautam Vadlamudi,et al.  $\hbox{MAWA}^{\ast}$—A Memory-Bounded Anytime Heuristic-Search Algorithm , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Carlos Eduardo Ferreira,et al.  Repetition-free longest common subsequence , 2010, Discret. Appl. Math..

[30]  M. W. Du,et al.  Computing a longest common subsequence for a set of strings , 1984, BIT.

[31]  Moshe Lewenstein,et al.  Constrained LCS: Hardness and Approximation , 2008, CPM.

[32]  C. Blum,et al.  Longest Common Subsequence Problems , 2016 .

[33]  V. Chvátal,et al.  Longest common subsequences of two random sequences , 1975, Advances in Applied Probability.

[34]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[35]  Christian Blum Beam-ACO for the longest common subsequence problem , 2010, IEEE Congress on Evolutionary Computation.

[36]  Hisashi Tanaka,et al.  Large DNA palindromes as a common form of structural chromosome aberrations in human cancers , 2011, Human Cell.

[37]  Md. Rafiqul Islam,et al.  Chemical reaction optimization for solving longest common subsequence problem for multiple string , 2018, Soft Computing.

[38]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[39]  Christian Blum,et al.  Exact and Heuristic Approaches for the Longest Common Palindromic Subsequence Problem , 2018, LION.

[40]  Mike Paterson,et al.  Upper Bounds for the Expected Length of a Longest Common Subsequence of Two Binary Sequences , 1995, Random Struct. Algorithms.

[41]  Kun-Mao Chao,et al.  A fast algorithm for computing a longest common increasing subsequence , 2005, Inf. Process. Lett..

[42]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[43]  John D. Dixon,et al.  Longest common subsequences in binary sequences , 2013, 1307.2796.

[44]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[45]  Charles Q. Choi DNA palindromes found in cancer , 2005, Genome Biology.

[46]  Qingguo Wang,et al.  A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings , 2010, AAAI.

[47]  Md. Mahbubul Hasan,et al.  Palindromic Subsequence Automata and Longest Common Palindromic Subsequence , 2017, Math. Comput. Sci..

[48]  Eric Horvitz,et al.  Reasoning about beliefs and actions under computational resource constraints , 1987, Int. J. Approx. Reason..

[49]  Shlomo Zilberstein,et al.  Anytime Heuristic Search: First Results , 1997 .

[50]  Sayyed Rasoul Mousavi,et al.  An improved algorithm for the longest common subsequence problem , 2012, Comput. Oper. Res..

[51]  Todd Easton,et al.  A Specialized Branching and Fathoming Technique for The Longest Common Subsequence Problem , 2007 .

[52]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[53]  Gerald R. Smith,et al.  Meeting DNA palindromes head-to-head. , 2008, Genes & development.

[54]  Bin Ma,et al.  The longest common subsequence problem for arc-annotated sequences , 2004, J. Discrete Algorithms.

[55]  Kenneth Y. Goldberg,et al.  Anytime Nonparametric A , 2011, AAAI.

[56]  Jirí Matousek,et al.  Expected Length of the Longest Common Subsequence for Large Alphabets , 2003, LATIN.

[57]  Eric A. Hansen,et al.  Beam-Stack Search: Integrating Backtracking with Beam Search , 2005, ICAPS.

[58]  Shihabur Rahman Chowdhury,et al.  Computing a Longest Common Palindromic Subsequence , 2014, Fundam. Informaticae.

[59]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..