Pareto optimization in algebraic dynamic programming

Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This gives, in a precise sense, better information than an artificial amalgamation of different scores into a single objective, but is more costly to compute. Pareto optimization naturally occurs with genetic algorithms, albeit in a heuristic fashion. Non-heuristic Pareto optimization so far has been used only with a few applications in bioinformatics. We study exact Pareto optimization for two objectives in a dynamic programming framework. We define a binary Pareto product operator $${*}_{\text {Par}}$$∗Par on arbitrary scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme $$A {*}_{\text {Par}}B$$A∗ParB correctly performs Pareto optimization over the same search space. We study different implementations of the Pareto operator with respect to their asymptotic and empirical efficiency. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective. For RNA structure prediction under the minimum free energy versus the maximum expected accuracy model, we show that the empirical size of the Pareto front remains within reasonable bounds. Pareto optimization lends itself to the comparative investigation of the behavior of two alternative scoring schemes for the same purpose. For the above scoring schemes, we observe that the Pareto front can be seen as a composition of a few macrostates, each consisting of several microstates that differ in the same limited way. We also study the relationship between abstract shape analysis and the Pareto front, and find that they extract information of a different nature from the folding space and can be meaningfully combined.

[1]  Jagath C. Rajapakse,et al.  Multiclass Gene Selection Using Pareto-Fronts , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Michal Ziv-Ukelson,et al.  A Study of Accessible Motifs and RNA Folding Complexity , 2006, RECOMB.

[3]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[4]  Mordechai I. Henig,et al.  The Principle of Optimality in Dynamic Programming with Returns in Partially Ordered Sets , 1985, Math. Oper. Res..

[5]  Jens Stoye,et al.  A Novel Approach to Remote Homology Detection: Jumping Alignments , 2002, J. Comput. Biol..

[6]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[7]  Jan Gorodkin,et al.  The foldalign web server for pairwise structural RNA alignment and mutual motif search , 2005, Nucleic Acids Res..

[8]  Ran Libeskind-Hadas,et al.  Pareto-optimal phylogenetic tree reconciliation , 2014, Bioinform..

[9]  Robert Giegerich,et al.  Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction , 2011, BMC Bioinformatics.

[10]  Christian Höner zu Siederdissen,et al.  Sneaking around concatMap: efficient combinators for dynamic programming , 2012, ICFP.

[11]  Jürgen Teich,et al.  Quad-trees: A Data Structure for Storing Pareto Sets in Multiobjective Evolutionary Algorithms with Elitism , 2005, Evolutionary Multiobjective Optimization.

[12]  James Cheney Scrap your nameplate: (functional pearl) , 2005, ICFP '05.

[13]  Hans A. Kestler,et al.  RNA-Pareto: interactive analysis of Pareto-optimal RNA sequence-structure alignments , 2013, Bioinform..

[14]  Akito Taneda Multi-objective pairwise RNA sequence alignment , 2010, Bioinform..

[15]  Robert Giegerich,et al.  Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis , 2013, Bioinform..

[16]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[17]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science (2. ed.) , 1994 .

[18]  T. Morin Monotonicity and the principle of optimality , 1982 .

[19]  Ronald L. Graham,et al.  Concrete Mathematics, a Foundation for Computer Science , 1991, The Mathematical Gazette.

[20]  David H. Mathews,et al.  Predicting a set of minimal free energy RNA secondary structures common to two sequences , 2005, Bioinform..

[21]  T. Simpson,et al.  Algorithms to identify pareto points in multi-dimensional data sets , 2004 .

[22]  Timothy W. Simpson,et al.  Analysis of an Algorithm for Identifying Pareto Points in Multi-Dimensional Data Sets , 2004 .

[23]  Markus E. Nebel,et al.  On quantitative effects of RNA shape abstraction , 2009, Theory in Biosciences.

[24]  Hans A. Kestler,et al.  Structural RNA alignment by multi-objective optimization , 2013, Bioinform..

[25]  Robert Giegerich,et al.  Versatile and declarative dynamic programming using pair algebras , 2005, BMC Bioinformatics.

[26]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[27]  Andrew K. C. Wong,et al.  Toward efficient multiple molecular sequence alignment: a system of genetic algorithm and dynamic programming , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Sebastian Sitarz Pareto optimal allocations and dynamic programming , 2009, Ann. Oper. Res..

[29]  Robert Giegerich,et al.  Bellman's GAP: a declarative language for dynamic programming , 2011, PPDP.

[30]  George Forman,et al.  A pitfall and solution in multi-class feature selection for text classification , 2004, ICML.

[31]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[32]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[33]  Thomas Schnattinger Multi-objective optimization for RNA folding, alignment and phylogeny , 2014 .

[34]  Michael M. Kostreva,et al.  A Generalization of Dynamic Programming for Pareto Optimization in Dynamic Networks , 2000, RAIRO Oper. Res..

[35]  E. Rivas,et al.  The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective , 2013, RNA biology.

[36]  Robert Giegerich,et al.  A discipline of dynamic programming over sequence data , 2004, Sci. Comput. Program..

[37]  Robert Giegerich,et al.  Yield grammar analysis and product optimization in a domain-specific language for dynamic programming , 2014, Sci. Comput. Program..

[38]  Robert Giegerich,et al.  RNA Movies: Visualizing RNA secondary structure spaces , 1997, German Conference on Bioinformatics.

[39]  K. Murphy,et al.  Computational approaches for RNA energy parameter estimation. , 2010, RNA.

[40]  Akito Taneda,et al.  MODENA: a multi-objective RNA inverse folding , 2010, Advances and applications in bioinformatics and chemistry : AABC.

[41]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .