Lynx: A Programmatic SAT Solver for the RNA-Folding Problem

This paper introduces Lynx, an incremental programmatic SAT solver that allows non-expert users to introduce domain-specific code into modern conflict-driven clause-learning (CDCL) SAT solvers, thus enabling users to guide the behavior of the solver. The key idea of Lynx is a callback interface that enables non-expert users to specialize the SAT solver to a class of Boolean instances. The user writes specialized code for a class of Boolean formulas, which is periodically called by Lynx's search routine in its inner loop through the callback interface. The user-provided code is allowed to examine partial solutions generated by the solver during its search, and to respond by adding CNF clauses back to the solver dynamically and incrementally. Thus, the user-provided code can specialize and influence the solver's search in a highly targeted fashion. While the power of incremental SAT solvers has been amply demonstrated in the SAT literature and in the context of DPLL(T), it has not been previously made available as a programmatic API that is easy to use for non-expert users. Lynx's callback interface is a simple yet very effective strategy that addresses this need. We demonstrate the benefits of Lynx through a case-study from computational biology, namely, the RNA secondary structure prediction problem. The constraints that make up this problem fall into two categories: structural constraints, which describe properties of the biological structure of the solution, and energetic constraints, which encode quantitative requirements that the solution must satisfy. We show that by introducing structural constraints on-demand through user provided code we can achieve, in comparison with standard SAT approaches, upto 30x reduction in memory usage and upto 100x reduction in time.

[1]  Cesare Tinelli,et al.  Handbook of Satisfiability , 2021, Handbook of Satisfiability.

[2]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[3]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[4]  Ofer Strichman,et al.  HaifaSat: A New Robust SAT Solver , 2005, Haifa Verification Conference.

[5]  Cesare Tinelli,et al.  DPLL( T): Fast Decision Procedures , 2004, CAV.

[6]  Peter J. Stuckey,et al.  Propagation = Lazy Clause Generation , 2007, CP.

[7]  A. Zee,et al.  Topological classification of RNA structures. , 2006, Journal of molecular biology.

[8]  Toby Walsh,et al.  Handbook of Satisfiability: Volume 185 Frontiers in Artificial Intelligence and Applications , 2009 .

[9]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[10]  Tatsuya Akutsu,et al.  Prediction of RNA secondary structure with pseudoknots using integer programming , 2009, BMC Bioinformatics.

[11]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[12]  Armin Biere,et al.  Effective Bit-Width and Under-Approximation , 2009, EUROCAST.

[13]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[15]  Christian N. S. Pedersen,et al.  Pseudoknots in RNA secondary structures , 2000, RECOMB '00.

[16]  Christian Bessière Principles and Practice of Constraint Programming - CP 2007, 13th International Conference, CP 2007, Providence, RI, USA, September 23-27, 2007, Proceedings , 2007, CP.

[17]  Joël Ouaknine,et al.  Abstraction-Based Satisfiability Solving of Presburger Arithmetic , 2004, CAV.

[18]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[19]  Jesper Tegnér,et al.  On reliable discovery of molecular signatures , 2009, BMC Bioinformatics.

[20]  David H Mathews,et al.  Prediction of RNA secondary structure by free energy minimization. , 2006, Current opinion in structural biology.

[21]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[22]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[23]  Claude Castelluccia,et al.  Extending SAT Solvers to Cryptographic Problems , 2009, SAT.

[24]  Anne Condon,et al.  Classifying RNA pseudoknotted structures , 2004, Theor. Comput. Sci..

[25]  F. Major,et al.  The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data , 2008, Nature.

[26]  Joël Ouaknine,et al.  Deciding Bit-Vector Arithmetic with Abstraction , 2007, TACAS.

[27]  Armando Tacchella,et al.  Towards an Efficient Library for SAT: a Manifesto , 2001, Electron. Notes Discret. Math..

[28]  H. Hoos,et al.  HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. , 2005, RNA.

[29]  D. W. Staple,et al.  Open access, freely available online Primer Pseudoknots: RNA Structures with Diverse Functions , 2022 .

[30]  Helmut Veith,et al.  Counterexample-guided abstraction refinement for symbolic model checking , 2003, JACM.

[31]  David L. Dill,et al.  A Decision Procedure for Bit-Vectors and Arrays , 2007, CAV.

[32]  Oliver Kullmann,et al.  Theory and Applications of Satisfiability Testing - SAT 2009, 12th International Conference, SAT 2009, Swansea, UK, June 30 - July 3, 2009. Proceedings , 2009, SAT.