ExpertRNA: A new framework for RNA structure prediction

Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”: given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms have largely relied on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs have also been investigated, and have recently been shown to outperform free energy based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework which can easily incorporate an arbitrary number of rewards (free energy or non-parametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test the ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions than the structure prediction algorithm used, thus validating the promise of the approach.

[1]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[2]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[3]  M. Ladomery,et al.  Molecular biology of RNA , 1988, Journal of Cellular Biochemistry.

[4]  J. Feigon,et al.  Solution structure of an ATP-binding RNA aptamer reveals a novel fold. , 1997, RNA.

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  John N. Tsitsiklis,et al.  Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.

[7]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[8]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[9]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[10]  E. Siggia,et al.  Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[12]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[13]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[14]  Eric Westhof,et al.  RNA Tertiary Structure , 2006 .

[15]  Kevin P. Murphy,et al.  Efficient parameter estimation for RNA secondary structure prediction , 2007, ISMB/ECCB.

[16]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[17]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[18]  Philip S. Yu,et al.  Positive Unlabeled Learning for Data Stream Classification , 2009, SDM.

[19]  Peixuan Guo The emerging field of RNA nanotechnology. , 2010, Nature nanotechnology.

[20]  K. Weeks,et al.  SHAPE-directed RNA secondary structure prediction. , 2010, Methods.

[21]  Shi-Jie Chen,et al.  Predicting ion binding properties for RNA tertiary structures. , 2010, Biophysical journal.

[22]  K. Murphy,et al.  Computational approaches for RNA energy parameter estimation. , 2010, RNA.

[23]  Cole Trapnell,et al.  Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) , 2011, Proceedings of the National Academy of Sciences.

[24]  David H. Mathews,et al.  Automated RNA tertiary structure prediction from secondary structure and low‐resolution restraints , 2011, J. Comput. Chem..

[25]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[26]  Conrad Steenberg,et al.  NUPACK: Analysis and design of nucleic acid systems , 2011, J. Comput. Chem..

[27]  Feng Ding,et al.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. , 2012, RNA.

[28]  Holger H. Hoos,et al.  Ensemble-based prediction of RNA secondary structures , 2013, BMC Bioinformatics.

[29]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[30]  Peng Yin,et al.  Conditional Dicer Substrate Formation via Shape and Sequence Transduction with Small Conditional RNAs , 2013, Journal of the American Chemical Society.

[31]  Cody W. Geary,et al.  A single-stranded architecture for cotranscriptional folding of RNA nanostructures , 2014, Science.

[32]  J. Collins,et al.  Toehold Switches: De-Novo-Designed Regulators of Gene Expression , 2014, Cell.

[33]  Ivo L. Hofacker,et al.  Forna (force-directed RNA): Simple and effective online RNA secondary structure diagrams , 2015, Bioinform..

[34]  Hao Yan,et al.  Single-stranded DNA and RNA origami , 2017, Science.

[35]  D. Mathews,et al.  Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best , 2017, Nucleic acids research.

[36]  Michael F. Sloma,et al.  Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs , 2017, PLoS Comput. Biol..

[37]  Katarzyna J Purzycka,et al.  RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. , 2017, RNA.

[38]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.

[39]  Shi-jie Chen,et al.  Predicting Cotranscriptional Folding Kinetics For Riboswitch. , 2018, The journal of physical chemistry. B.

[40]  Kyle E. Watters,et al.  Computationally Reconstructing Cotranscriptional RNA Folding Pathways from Experimental Data Reveals Rearrangement of Non-Native Folding Intermediates , 2018, bioRxiv.

[41]  Tomasz Zok,et al.  New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation , 2017, Bioinform..

[42]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[43]  Liang Huang,et al.  Learning to Fold RNAs in Linear Time , 2019 .

[44]  Yaoqi Zhou,et al.  RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning , 2019, Nature Communications.

[45]  Andrew M. Watkins,et al.  FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds , 2019, bioRxiv.

[46]  Hao Yan,et al.  ENTRNA: a framework to predict RNA foldability , 2019, BMC Bioinformatics.

[47]  Chao Lu,et al.  DMfold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle , 2019, Front. Genet..

[48]  Yaoqi Zhou,et al.  Identifying molecular recognition features in intrinsically disordered regions of proteins by transfer learning , 2019, Bioinform..

[49]  Giovanni Bussi,et al.  Machine learning a model for RNA structure prediction , 2020 .

[50]  Rhiju Das,et al.  RNA secondary structure packages evaluated and improved by high-throughput experiments , 2020, bioRxiv.

[51]  Hao Yan,et al.  RNA Origami Nanostructures for Potent and Safe Anti-Cancer Immunotherapy. , 2020, ACS nano.