Modeling Dynamic Programming Problems over Sequences and Trees with Inverse Coupled Rewrite Systems

Dynamic programming is a classical algorithmic paradigm, which often allows the evaluation of a search space of exponential size in polynomial time. Recursive problem decomposition, tabulation of intermediate results for re-use, and Bellman’s Principle of Optimality are its well-understood ingredients. However, algorithms often lack abstraction and are difficult to implement, tedious to debug, and delicate to modify. The present article proposes a generic framework for specifying dynamic programming problems. This framework can handle all kinds of sequential inputs, as well as tree-structured data. Biosequence analysis, document processing, molecular structure analysis, comparison of objects assembled in a hierarchic fashion, and generally, all domains come under consideration where strings and ordered, rooted trees serve as natural data representations. The new approach introduces inverse coupled rewrite systems. They describe the solutions of combinatorial optimization problems as the inverse image of a term rewrite relation that reduces problem solutions to problem inputs. This specification leads to concise yet translucent specifications of dynamic programming algorithms. Their actual implementation may be challenging, but eventually, as we hope, it can be produced automatically. The present article demonstrates the scope of this new approach by describing a diverse set of dynamic programming problems which arise in the domain of computational biology, with examples in biosequence and molecular structure analysis.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Robert Giegerich,et al.  Explaining and Controlling Ambiguity in Dynamic Programming , 2000, CPM.

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  Robert Giegerich,et al.  Bellman's GAP: a declarative language for dynamic programming , 2011, PPDP.

[5]  Francesc Rosselló,et al.  An algebraic view of the relation between largest common subtrees and smallest common supertrees , 2006, Theor. Comput. Sci..

[6]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[7]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[8]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[9]  Fabrice Lefebvre,et al.  A Grammar-Based Unification of Several Alignment and Folding Algorithms , 1996, ISMB.

[10]  Robert Giegerich,et al.  Semantics and Ambiguity of Stochastic RNA Family Models , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Robert Giegerich,et al.  Forest Alignment with Affine Gaps and Anchors , 2011, CPM.

[12]  Hélène Touzet,et al.  Tree edit distance with gaps , 2003, Inf. Process. Lett..

[13]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[14]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[15]  Christian Höner zu Siederdissen,et al.  Sneaking around concatMap: efficient combinators for dynamic programming , 2012, ICFP.

[16]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[17]  Jianwei Zhang,et al.  Learning cooperative assembly with the graph representation of a state-action space , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[19]  Gad M. Landau,et al.  Locality and Gaps in RNA Comparison , 2007, J. Comput. Biol..

[20]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[21]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[22]  Robert Giegerich,et al.  A discipline of dynamic programming over sequence data , 2004, Sci. Comput. Program..

[23]  Alberto H. F. Laender,et al.  Automatic web news extraction using tree edit distance , 2004, WWW '04.

[24]  Robert Giegerich,et al.  Challenges in the compilation of a domain specific language for dynamic programming , 2006, SAC '06.

[25]  Wing-Kin Sung,et al.  Local Gapped Subforest Alignment and Its Application in Finding RNA Structural Motifs , 2004, ISAAC.

[26]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[27]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[28]  David B. Searls,et al.  The computational linguistics of biological sequences , 1993, ISMB 1995.

[29]  P. Gács,et al.  Algorithms , 1992 .

[30]  David B. Searls,et al.  Automata-Theoretic Models of Mutation and Alignment , 1995, ISMB.

[31]  Robert D. Finn,et al.  Rfam: updates to the RNA families database , 2008, Nucleic Acids Res..

[32]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[33]  Michael Hanus,et al.  KiCS2: A New Compiler from Curry to Haskell , 2011, WFLP.

[34]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[35]  Enno Ohlebusch,et al.  Advanced Topics in Term Rewriting , 2002, Springer New York.

[36]  Robert Giegerich,et al.  Locomotif: from graphical motif description to RNA motif search , 2007, ISMB/ECCB.

[37]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[38]  Hélène Touzet,et al.  Decomposition algorithms for the tree edit distance problem , 2005, J. Discrete Algorithms.

[39]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[40]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[41]  Robert Giegerich,et al.  Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis , 2013, Bioinform..

[42]  Kaizhong Zhang,et al.  Alignment between Two RNA Structures , 2001, MFCS.

[43]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[44]  David B. Searls,et al.  Linguistic approaches to biological sequences , 1997, Comput. Appl. Biosci..

[45]  David B. Searls Investigating the Linguistics of DNA with Definite Clause Grammars , 1989, NACLP.

[46]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[47]  J. Baker Trainable grammars for speech recognition , 1979 .

[48]  Jason Eisner,et al.  Dyna: Extending Datalog for Modern AI , 2010, Datalog.

[49]  Chang Liu,et al.  Term rewriting and all that , 2000, SOEN.