Bellman's GAP: a 2nd generation language and system for algebraic dynamic programming

The dissertation describes the new Bellman’s GAP which is a programming system for writing dynamic programming algorithms over sequential data. It is the second generation implementation of the algebraic dynamic programming framework (ADP). The system includes the multi-paradigm language (GAP-L), its compiler (GAP-C), functional modules (GAP-M) and a web site (GAP Pages) to experiment with GAP-L programs. GAP-L includes declarative constructs, e.g. tree grammars to model the search space, and imperative constructs for programming advanced scoring functions. The syntax of GAP-L is similar to C/Java to lower usage barriers. GAP-C translates the high-level and index-free GAP-L programs into efficient C++-Code, which is competitive with handwritten code. It includes a novel table design optimization algorithm, support for dynamic programming (DP) over multiple sequences (multi-track DP), sampling, optional top-down evaluation, various backtracing schemes etc. GAP-M includes modules for use in GAP-L programs. Examples are efficient representations of classification data types and sampling as well as filter helper functions. GAP Pages contain web dialogs for selected text book dynamic programming algorithms implemented in GAP-L. The web dialogs allow interactive ad-hoc experiments with different inputs and combinations of algebras. Several benchmarks and examples in the dissertation show the practical efficiency of Bellman’s GAP in terms of program runtime and development time.

[1]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[2]  Walid Taha,et al.  Staging Dynamic Programming Algorithms , 2005 .

[3]  R. Giegerich,et al.  Fast and effective prediction of microRNA/target duplexes. , 2004, RNA.

[4]  Robert Giegerich,et al.  Table design in dynamic programming , 2006, Inf. Comput..

[5]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[6]  Jens Reeder Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach , 2007 .

[7]  Robert Giegerich,et al.  Explaining and Controlling Ambiguity in Dynamic Programming , 2000, CPM.

[8]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[9]  Roman Leshchinskiy,et al.  Rewriting Haskell Strings , 2007, PADL.

[10]  G. Sauthoff Java-Backend für den ADP-Compiler , 2007 .

[11]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[12]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[13]  David Eppstein,et al.  Sparse dynamic programming , 1990, SODA '90.

[14]  Robert Giegerich,et al.  Faster computation of exact RNA shape probabilities , 2010, Bioinform..

[15]  Robert Giegerich,et al.  Implementing Algebraic Dynamic Programming in the Functional and the Imperative Programming Paradigm , 2002, MPC.

[16]  Michal Ziv-Ukelson,et al.  A Study of Accessible Motifs and RNA Folding Complexity , 2007, J. Comput. Biol..

[17]  David Eppstein,et al.  Sparse dynamic programming I: linear cost functions , 1992, JACM.

[18]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[19]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[20]  Guang R. Gao,et al.  A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison , 2000, Pacific Symposium on Biocomputing.

[21]  Robert Giegerich,et al.  A discipline of dynamic programming over sequence data , 2004, Sci. Comput. Program..

[22]  Stephen H. Unger A global parser for context-free phrase structure grammars , 1968, CACM.

[23]  Jan Krüger,et al.  RNA-related tools on the Bielefeld Bioinformatics Server , 2003, Nucleic Acids Res..

[24]  David Eppstein,et al.  Sparse dynamic programming II: convex and concave cost functions , 1992, JACM.

[25]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[26]  Rolf Backofen,et al.  Sparse RNA Folding: Time and Space Efficient Algorithms , 2009, CPM.

[27]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[28]  Georg Sautho Java-Backend für den ADP-Compiler , 2007 .

[29]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[30]  Robert Giegerich,et al.  Correction: versatile and declarative dynamic programming using pair algebras , 2006, BMC Bioinformatics.

[31]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[32]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[33]  R. Bellman Dynamic programming. , 1957, Science.

[34]  Björn Voß Advanced tools for RNA secondary structure analysis , 2004 .

[35]  J. Baker Trainable grammars for speech recognition , 1979 .

[36]  Robert Giegerich,et al.  Alignment of Minisatellite Maps: A Minimum Spanning Tree-based Approach , 2008, APBC.

[37]  Akimasa Morihata,et al.  A Short Cut to Optimal Sequences , 2009, New Generation Computing.

[38]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[39]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[40]  Peter Clote,et al.  Asymptotics of RNA Shapes , 2008, J. Comput. Biol..

[41]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[42]  M. Höchsmann,et al.  The tree alignment model : algorithms, implementations and applications for the analysis of RNA secondary structures , 2005 .

[43]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[44]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[45]  Robert Giegerich,et al.  Versatile and declarative dynamic programming using pair algebras , 2005, BMC Bioinformatics.

[46]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[47]  Peter Steffen,et al.  Compiling a domain specific language for dynamic programming , 2006 .

[48]  Peter F. Stadler,et al.  Prediction of RNA Base Pairing Probabilities on Massively Parallel Computers , 2000, J. Comput. Biol..

[49]  Robert Giegerich,et al.  A graphical programming system for molecular motif search , 2006, GPCE '06.

[50]  Rolf Backofen,et al.  Sparsification of RNA structure prediction including pseudoknots , 2010, Algorithms for Molecular Biology.

[51]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[52]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[53]  Ewan Birney,et al.  Dynamite: A Flexible Code Generating Language for Dynamic Programming Methods Used in Sequence Comparison , 1997, ISMB.