A discipline of dynamic programming over sequence data

Dynamic programming is a classical programming technique, applicable in a wide variety of domains such as stochastic systems analysis, operations research, combinatorics of discrete structures, flow problems, parsing of ambiguous languages, and biosequence analysis. Little methodology has hitherto been available to guide the design of such algorithms. The matrix recurrences that typically describe a dynamic programming algorithm are difficult to construct, error-prone to implement, and, in nontrivial applications, almost impossible to debug completely.This article introduces a discipline designed to alleviate this problem. We describe an algebraic style of dynamic programming over sequence data. We define its formal framework, based on a combination of grammars and algebras, and including a formalization of Bellman's Principle. We suggest a language used for algorithm design on a convenient level of abstraction. We outline three ways of implementing this language, including an embedding in a lazy functional language. The workings of the new method are illustrated by a series of examples drawn from diverse areas of computer science.

[1]  T. Morin Monotonicity and the principle of optimality , 1982 .

[2]  Gilles Brassard,et al.  Algorithmics - theory and practice , 1988 .

[3]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[4]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[5]  Erik Poll,et al.  Algebra of Programming by Richard Bird and Oege de Moor, Prentice Hall, 1996 (dated 1997). , 1999 .

[6]  Jan Krüger,et al.  RNA-related tools on the Bielefeld Bioinformatics Server , 2003, Nucleic Acids Res..

[7]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[8]  Robert Giegerich,et al.  A Declarative Approach to the Development of Dynamic Programming Algorithms, Applied to RNA Folding , 1998 .

[9]  Richard S. Bird,et al.  Algebra of programming , 1997, Prentice Hall International series in computer science.

[10]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[11]  Robert Giegerich,et al.  Explaining and Controlling Ambiguity in Dynamic Programming , 2000, CPM.

[12]  Sharon Curtis Dynamic programming: a different perspective , 1997, Algorithmic Languages and Calculi.

[13]  Richard S. Bird,et al.  Algorithmic Languages and Calculi , 1997, IFIP Advances in Information and Communication Technology.

[14]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[16]  Robert Giegerich,et al.  An Algebraic Dynamic Programming Approach to the Analysis of Recombinant DNA Sequences , 2003 .

[17]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[19]  Robert Giegerich,et al.  Implementing Algebraic Dynamic Programming in the Functional and the Imperative Programming Paradigm , 2002, MPC.

[20]  Oege de Moor,et al.  Dynamic Programming as a Software Component , 1999 .

[21]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[22]  Robert Giegerich,et al.  A systematic approach to dynamic programming in bioinformatics , 2000, Bioinform..

[23]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[24]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[25]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[26]  Richard S. Bird,et al.  From Dynamic Programming to Greedy Algorithms , 1993, Formal Program Development.

[27]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[28]  L. G. Mitten Composition Principles for Synthesis of Optimal Multistage Processes , 1964 .

[29]  Walter S. Brainerd,et al.  Tree Generating Regular Systems , 1969, Inf. Control..

[30]  R. Sedgewick,et al.  Algorithms (2nd ed.) , 1988 .

[31]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[32]  Graham Hutton,et al.  Higher-order functions for parsing , 1992, Journal of Functional Programming.

[33]  R. Giegerich,et al.  Matching and Significance Evaluation of Combined Sequence-Structure Motifs in RNA , 2002 .

[34]  Robert Giegerich,et al.  Code Selection Techniques: Pattern Matching, Tree Parsing, and Inversion of Derivors , 1988, ESOP.

[35]  Dirk J. Evers RNA folding via algebraic dynamic programming , 2003 .

[36]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[37]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[38]  Robert Giegerich,et al.  Algebraic Dynamic Programming , 2002, AMAST.

[39]  Robert Giegerich,et al.  Reducing the Conformation Space in RNA Structure Prediction , 2001, German Conference on Bioinformatics.

[40]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.