Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data.

Secondary structure prediction is an important problem in RNA bioinformatics because knowledge of structure is critical to understanding the functions of RNA sequences. Significant improvements in prediction accuracy have recently been demonstrated though the incorporation of experimentally obtained structural information, for instance using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) mapping. However, such mapping data is currently available only for a limited number of RNA sequences. In this article, we present a method for extending the benefit of experimental mapping data in secondary structure prediction to homologous sequences. Specifically, we propose a method for integrating experimental mapping data into a comparative sequence analysis algorithm for secondary structure prediction of multiple homologs, whereby the mapping data benefits not only the prediction for the specific sequence that was mapped but also other homologs. The proposed method is realized by modifying the TurboFold II algorithm for prediction of RNA secondary structures to utilize basepairing probabilities guided by SHAPE experimental data when such data are available. The SHAPE-mapping-guided basepairing probabilities are obtained using the RSample method. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone (TurboFold II). The updated version of TurboFold II is freely available as part of the RNAstructure software package.

[1]  Jan Gorodkin,et al.  RNA structural alignments, part I: Sankoff-based approaches for structural alignments. , 2014, Methods in molecular biology.

[2]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[4]  K. Weeks,et al.  The genetic code as expressed through relationships between mRNA structure and protein function , 2013, FEBS Letters.

[5]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Gaurav Sharma,et al.  TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences , 2011, BMC Bioinformatics.

[7]  K. Weeks,et al.  RNA structure analysis at single nucleotide resolution by selective 2'-hydroxyl acylation and primer extension (SHAPE). , 2005, Journal of the American Chemical Society.

[8]  R. Gutell,et al.  The accuracy of ribosomal RNA comparative structure models. , 2002, Current opinion in structural biology.

[9]  D. Mathews,et al.  Statistical evaluation of improvement in RNA secondary structure prediction , 2011, Nucleic acids research.

[10]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[11]  David H Mathews,et al.  RNA structure prediction: an overview of methods. , 2012, Methods in molecular biology.

[12]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[13]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[14]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[15]  Peter Clote,et al.  Integrating Chemical Footprinting Data into RNA Secondary Structure Prediction , 2012, PloS one.

[16]  Howard Y. Chang,et al.  RNA SHAPE analysis in living cells. , 2013, Nature chemical biology.

[17]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[18]  D M Crothers,et al.  Proton nuclear magnetic resonance studies on bulge-containing DNA oligonucleotides from a mutational hot-spot sequence. , 1987, Biochemistry.

[19]  Ivo L Hofacker,et al.  Energy-directed RNA structure prediction. , 2014, Methods in molecular biology.

[20]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[21]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[22]  D. Mathews,et al.  ProbKnot: fast prediction of RNA secondary structure including pseudoknots. , 2010, RNA.

[23]  Ge Zhang,et al.  Model-Free RNA Sequence and Structure Alignment Informed by SHAPE Probing Reveals a Conserved Alternate Secondary Structure for 16S rRNA , 2015, PLoS Comput. Biol..

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  K. Weeks,et al.  The mechanisms of RNA SHAPE chemistry. , 2012, Journal of the American Chemical Society.

[26]  Christine E. Heitsch,et al.  Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions , 2013, Nucleic acids research.

[27]  Michael P Snyder,et al.  SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data , 2013, Genome research.

[28]  J. Woolford,et al.  Mod-seq: high-throughput sequencing for chemical probing of RNA structure , 2014, RNA.

[29]  Howard Y. Chang,et al.  Genome-wide measurement of RNA secondary structure in yeast , 2010, Nature.

[30]  Brent M. Znosko,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for the formation of RNA duplexes with single nucleotide bulges. , 2002, Biochemistry.

[31]  D. Mathews,et al.  Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots , 2013, Proceedings of the National Academy of Sciences.

[32]  Y. Zhang,et al.  In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features , 2013, Nature.

[33]  D. Mathews,et al.  Improved RNA secondary structure prediction by maximizing expected pair accuracy. , 2009, RNA.

[34]  Kiyoshi Asai,et al.  RNA structural alignments, part II: non-Sankoff approaches for structural alignments. , 2014, Methods in molecular biology.

[35]  D. Mathews,et al.  Dynalign II: common secondary structure prediction for RNA homologs with domain insertions , 2014, Nucleic acids research.

[36]  Michael F. Sloma,et al.  Improving RNA secondary structure prediction with structure mapping data. , 2015, Methods in enzymology.

[37]  D. Haussler,et al.  FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing , 2010, Nature Methods.

[38]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[39]  D. Mathews,et al.  Accurate SHAPE-directed RNA structure determination , 2009, Proceedings of the National Academy of Sciences.

[40]  Pablo Cordero,et al.  Rich RNA Structure Landscapes Revealed by Mutate-and-Map Analysis , 2015, PLoS Comput. Biol..

[41]  Raymond F. Gesteland,et al.  Life Before DNA. (Book Reviews: The RNA World. The Nature of Modern RNA Suggests a Prebiotic RNA World.) , 1993 .

[42]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[43]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[44]  J. Steitz,et al.  The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones , 2014, Cell.

[45]  Manolis Kellis,et al.  RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction , 2012, Nucleic acids research.

[46]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[47]  A. Laederach,et al.  Evaluation of the information content of RNA structure mapping data for secondary structure prediction. , 2010, RNA.

[48]  Gaurav Sharma,et al.  Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign , 2007, BMC Bioinformatics.

[49]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[50]  Sharon Aviran,et al.  Data-directed RNA secondary structure prediction using probabilistic modeling , 2016, RNA.

[51]  Manolis Kellis,et al.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo , 2013, Nature.

[52]  Sean R Eddy,et al.  Computational analysis of conserved RNA secondary structure in transcriptomes and genomes. , 2014, Annual review of biophysics.

[53]  Ligang Wu,et al.  Let me count the ways: mechanisms of gene regulation by miRNAs and siRNAs. , 2008, Molecular cell.

[54]  Jennifer A. Doudna,et al.  The chemical repertoire of natural ribozymes , 2002, Nature.