Minimum length RNA folding trajectories

The Kinfold and KFOLD programs for RNA folding kinetics implement the Gillespie algorithm to generate stochastic folding trajectories from an initial structure s to a target structure t, in which each intermediate secondary structure is obtained from its predecessor by the addition, removal or shift of a single base pair. Define MS2 distance between secondary structures s and t to be the minimum path length to refold s to t, where a move from MS2 is applied in each step. We describe algorithms to compute the shortest MS2 folding trajectory between any two given RNA secondary structures. These algorithms include an optimal integer programming (IP) algorithm, an accurate and efficient near-optimal algorithm, a greedy algorithm, a branch-and-bound algorithm, and an optimal algorithm if one allows intermediate structures to contain pseudoknots. Our optimal IP [resp. near-optimal IP] algorithm maximizes [resp. approximately maximizes] the number of shifts and minimizes [resp. approximately minimizes] the number of base pair additions and removals by applying integer programming to (essentially) solve the minimum feedback vertex set (FVS) problem for the RNA conflict digraph, then applies topological sort to tether subtrajectories into the final optimal folding trajectory. We prove NP-hardness of the problem to determine the minimum barrier energy over all possible MS2 folding pathways, and conjecture that computing the MS2 distance between arbitrary secondary structures is NP-hard. Since our optimal IP algorithm relies on the FVS, known to be NP-complete for arbitrary digraphs, we compare the family of RNA conflict digraphs with the following classes of digraphs (planar, reducible flow graph, Eulerian, and tournament) for which FVS is known to be either polynomial time computable or NP-hard. Source code available at this http URL

[1]  P. Clote,et al.  Computing folding pathways between RNA secondary structures , 2009, Nucleic acids research.

[2]  Jeffrey D. Ullman,et al.  Flow Graph Reducibility , 1972, SIAM J. Comput..

[3]  D. Gillespie A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions , 1976 .

[4]  Peter Clote,et al.  RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules , 2015, Nucleic Acids Res..

[5]  A. Wagner Mutational robustness accelerates the origin of novel RNA phenotypes through phenotypic plasticity. , 2014, Biophysical journal.

[6]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[7]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[8]  P. Clote,et al.  RNA folding pathways and kinetics using 2D energy landscapes , 2015, Journal of mathematical biology.

[9]  Peter Clote,et al.  An IP Algorithm for RNA Folding Trajectories , 2017, WABI.

[10]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[11]  Bjarne Knudsen,et al.  Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars , 2011, BMC Bioinformatics.

[12]  Peter Clote,et al.  RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory , 2016, BMC Bioinformatics.

[13]  Eric C. Dykeman,et al.  An implementation of the Gillespie algorithm for RNA kinetics with logarithmic time update , 2015, Nucleic acids research.

[14]  D Thirumalai,et al.  Assembly mechanisms of RNA pseudoknots are determined by the stabilities of constituent secondary structures , 2009, Proceedings of the National Academy of Sciences.

[15]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[16]  Anders Yeo,et al.  The Minimum Feedback Arc Set Problem is NP-Hard for Tournaments , 2006, Combinatorics, Probability and Computing.

[17]  Peter Clote,et al.  Network Properties of the Ensemble of RNA Structures , 2015, PloS one.

[18]  A. Wagner Robustness and evolvability: a paradox resolved , 2008, Proceedings of the Royal Society B: Biological Sciences.

[19]  Stefan Washietl,et al.  Identifying Structural Noncoding RNAs Using RNAz , 2007, Current protocols in bioinformatics.

[20]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[21]  Peter Clote,et al.  Complete RNA inverse folding: computational design of functional hammerhead ribozymes , 2014, Nucleic acids research.

[22]  N. Rajewsky microRNA target predictions in animals , 2006, Nature Genetics.

[23]  Yann Ponty,et al.  VARNA: Interactive drawing and editing of the RNA secondary structure , 2009, Bioinform..

[24]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[25]  Michael T. Wolfinger,et al.  Barrier Trees of Degenerate Landscapes , 2002 .

[26]  D M Crothers,et al.  The Leptomonas collosoma spliced leader RNA can switch between two alternate structural forms. , 1993, Biochemistry.

[27]  R. Micura,et al.  Bistable secondary structures of small RNAs and their structural probing by comparative imino proton NMR spectroscopy. , 2003, Journal of molecular biology.

[28]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[29]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[30]  A. Serganov,et al.  Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs. , 2004, Chemistry & biology.