Determining parameters for non-linear models of multi-loop free energy change

MOTIVATION Predicting the secondary structure of RNA is a fundamental task in bioinformatics. Algorithms that predict secondary structure given only the primary sequence, and a model to evaluate the quality of a structure, are an integral part of this. These algorithms have been updated as our model of RNA thermodynamics changed and expanded. An exception to this has been the treatment of multi-loops. While more advanced models of multi-loop free energy change have been suggested, a simple, linear model has been used since the 1980s. However, recently, new dynamic programming algorithms for secondary structure prediction that could incorporate these models were presented. Unfortunately, these models appear to have lower accuracy for secondary structure prediction. RESULTS We apply linear regression and a new parameter optimization algorithm to find better parameters for the existing linear model and advanced, non-linear multi-loop models. These include the Jacobson-Stockmayer and Aalberts & Nandagopal models. We find that the current linear model parameters may be near optimal for the linear model, and that no advanced model performs better than the existing linear model parameters even after parameter optimization. AVAILABILITY Source code and data is available at https://github.com/maxhwardg/advanced_multiloops. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[2]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Gaurav Sharma,et al.  TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs , 2017, Nucleic acids research.

[4]  B. Ganem RNA world , 1987, Nature.

[5]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[6]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[7]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[8]  D. Turner,et al.  Improved predictions of secondary structures for RNA. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. M. Diamond,et al.  Fluorescence Competition and Optical Melting Measurements of RNA Three-Way Multibranch Loops Provide a Revised Model for Thermodynamic Parameters† , 2010, Biochemistry.

[10]  H. Jeffreys,et al.  Theory of probability , 1896 .

[11]  D. Crothers,et al.  Improved estimation of secondary structure in ribonucleic acids. , 1973, Nature: New biology.

[12]  W. Gilbert Origin of life: The RNA world , 1986, Nature.

[13]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[14]  Jennifer A. Doudna,et al.  The chemical repertoire of natural ribozymes , 2002, Nature.

[15]  Tamás Kiss,et al.  Site-Specific Ribose Methylation of Preribosomal RNA: A Novel Function for Small Nucleolar RNAs , 1996, Cell.

[16]  K. Umesono,et al.  Comparative and functional anatomy of group II catalytic introns--a review. , 1989, Gene.

[17]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[18]  D. Mathews,et al.  A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction , 2017, Nucleic acids research.

[19]  Alyssa C. Hill,et al.  Thermodynamic stabilities of three-way junction nanomotifs in prohead RNA. , 2017, RNA.

[20]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[21]  Harold S. Bernhardt The RNA world hypothesis: the worst theory of the early evolution of life (except for all the others)a , 2012, Biology Direct.

[22]  D. Mathews,et al.  Improved RNA secondary structure prediction by maximizing expected pair accuracy. , 2009, RNA.

[23]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[24]  S. Eddy,et al.  A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs , 2016, Nature Methods.

[25]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[26]  Anthony Ralston,et al.  Mathematical Methods for Digital Computers , 1960 .

[27]  J. M. Diamond,et al.  Thermodynamics of three-way multibranch loops in RNA. , 2001, Biochemistry.

[28]  Michael F. Sloma,et al.  Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures , 2016, RNA.

[29]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[30]  David H Mathews,et al.  Revolutions in RNA secondary structure prediction. , 2006, Journal of molecular biology.

[31]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[32]  E. Rivas,et al.  The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective , 2013, RNA biology.

[33]  Stephen Neidle,et al.  Principles of nucleic acid structure , 2007 .

[34]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[35]  Miroslawa Z. Barciszewska,et al.  5S ribosomal RNA database Y2K , 2000, Nucleic Acids Res..

[36]  Kevin P. Murphy,et al.  Efficient parameter estimation for RNA secondary structure prediction , 2007, ISMB/ECCB.

[37]  Nagarajan Nandagopal,et al.  A two-length-scale polymer theory for RNA loop free energies and helix stacking. , 2010, RNA.

[38]  I. Tinoco,et al.  Estimation of Secondary Structure in Ribonucleic Acids , 1971, Nature.

[39]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[40]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[41]  Christian Zwieb,et al.  tmRDB (tmRNA database) , 2003, Nucleic Acids Res..

[42]  Peter F. Stadler,et al.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes , 2008, Nucleic Acids Res..

[43]  D. Turner,et al.  A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation , 2006, Nucleic acids research.

[44]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[45]  K. Murphy,et al.  Computational approaches for RNA energy parameter estimation. , 2010, RNA.

[46]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..

[47]  H. Berg Cold Spring Harbor Symposia on Quantitative Biology.: Vol. LII. Evolution of Catalytic Functions. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1987, ISBN 0-87969-054-2, xix + 955 pp., US $150.00. , 1989 .

[48]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[49]  S. Eddy,et al.  A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more. , 2012, RNA.

[50]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[51]  Stephen A. Cook,et al.  Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes , 1986, SIAM J. Comput..

[52]  Homer Jacobson,et al.  Intramolecular Reaction in Polycondensations. I. The Theory of Linear Systems , 1950 .

[53]  D. Mathews,et al.  Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best , 2017, Nucleic acids research.

[54]  D. Turner,et al.  Experimentally derived nearest-neighbor parameters for the stability of RNA three- and four-way multibranch loops. , 2002, Biochemistry.

[55]  D. Mathews,et al.  Accurate SHAPE-directed RNA structure determination , 2009, Proceedings of the National Academy of Sciences.

[56]  James W. Brown,et al.  The Ribonuclease P Database , 1994, Nucleic Acids Res..

[57]  Lin He,et al.  MicroRNAs: small RNAs with a big role in gene regulation , 2004, Nature reviews genetics.

[58]  Paulo P. Amaral,et al.  The Eukaryotic Genome as an RNA Machine , 2008, Science.

[59]  Anne Condon,et al.  The determination of RNA folding nearest neighbor parameters. , 2014, Methods in molecular biology.

[60]  T. Tuschl,et al.  Mechanisms of gene silencing by double-stranded RNA , 2004, Nature.

[61]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .