Challenges in the compilation of a domain specific language for dynamic programming

Many combinatorial optimization problems in biosequence analysis are solved via dynamic programming. To increase programming productivity and program reliability, a domain specific language embedded in Haskell has been suggested. We point out several shortcomings of this approach, and report on some challenges in the (ongoing) project of migrating this domain specific language from its host language to a directly compiled implementation. Most of these challenges are domain specific optimizations, which not only improve significant constant factors of runtime and space requirements, but also affect asymptotic efficiency. We report on our solutions to some of these problems, and point out others that are still open.

[1]  Graham Hutton,et al.  Higher-order functions for parsing , 1992, Journal of Functional Programming.

[2]  David Eppstein,et al.  Sparse dynamic programming I: linear cost functions , 1992, JACM.

[3]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  David Eppstein,et al.  Sparse dynamic programming II: convex and concave cost functions , 1992, JACM.

[6]  Paul Hudak,et al.  Building domain-specific embedded languages , 1996, CSUR.

[7]  Reinhard Wilhelm,et al.  Grammar Flow Analysis , 1991, Attribute Grammars, Applications and Systems.

[8]  R. Giegerich,et al.  Fast and effective prediction of microRNA/target duplexes. , 2004, RNA.

[9]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[10]  Robert Giegerich,et al.  A discipline of dynamic programming over sequence data , 2004, Sci. Comput. Program..

[11]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[12]  Robert Giegerich,et al.  Versatile and declarative dynamic programming using pair algebras , 2005, BMC Bioinformatics.

[13]  David B. Searls,et al.  Linguistic approaches to biological sequences , 1997, Comput. Appl. Biosci..

[14]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[15]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[16]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[17]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[18]  Arie van Deursen,et al.  Domain-specific languages: an annotated bibliography , 2000, SIGP.

[19]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[20]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .