bpRNA: large-scale automated annotation and analysis of RNA secondary structure

Abstract While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, ‘bpRNA-1m’, of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.

[1]  J. Ng,et al.  PseudoBase: a database with RNA pseudoknots , 2000, Nucleic Acids Res..

[2]  Pamela L. Vanegas,et al.  RNA CoSSMos: Characterization of Secondary Structure Motifs—a searchable database of secondary structure motifs in RNA three-dimensional structures , 2011, Nucleic Acids Res..

[3]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[4]  Steven E. Brenner,et al.  SCOR: Structural Classification of RNA, version 2.0 , 2004, Nucleic Acids Res..

[5]  H. Cheong,et al.  RNA Structure: Tetraloops , 2010 .

[6]  John D. Westbrook,et al.  Tools for the automatic identification and classification of RNA base pairs , 2003, Nucleic Acids Res..

[7]  G. Rose,et al.  RNABase: an annotated database of RNA structures , 2003, Nucleic Acids Res..

[8]  D. Turner,et al.  RNA hairpin loop stability depends on closing base pair. , 1993, Nucleic acids research.

[9]  K. Murphy,et al.  Computational approaches for RNA energy parameter estimation. , 2010, RNA.

[10]  E. Westhof,et al.  Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. , 1990, Journal of molecular biology.

[11]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. R. Srinivasan,et al.  The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. , 1992, Biophysical journal.

[13]  I. Tinoco,et al.  Characterization of a "kissing" hairpin complex derived from the human immunodeficiency virus genome. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[15]  Christian Zwieb,et al.  The Signal Recognition Particle Database (SRPDB) , 1993, Nucleic Acids Res..

[16]  A Xayaphoummine,et al.  Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  P. Clote,et al.  On the page number of RNA secondary structures with pseudoknots , 2012, Journal of mathematical biology.

[18]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[19]  Dan Wang,et al.  Analysis of secondary structural elements in human microRNA hairpin precursors , 2016, BMC Bioinformatics.

[20]  David M. W. Powers Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning , 1998 .

[21]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[22]  G. Stormo,et al.  CUUCGG hairpins: extraordinarily stable RNA secondary structures associated with various biochemical processes. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[23]  D. Draper,et al.  Structure of a hexanucleotide RNA hairpin loop conserved in ribosomal RNAs. , 1996, Journal of molecular biology.

[24]  D. Turner,et al.  A model for the stabilities of RNA hairpins based on a study of the sequence dependence of stability for hairpins of six nucleotides. , 1994, Biochemistry.

[25]  Christian Zwieb,et al.  tmRDB (tmRNA database) , 2001, Nucleic Acids Res..

[26]  D. Turner,et al.  Thermodynamic study of internal loops in oligoribonucleotides: symmetric loops are more stable than asymmetric loops. , 1991, Biochemistry.

[27]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[28]  P. Stadler,et al.  RNA structures with pseudo-knots: Graph-theoretical, combinatorial, and statistical properties , 1999, Bulletin of mathematical biology.

[29]  N. Pace,et al.  Analysis of the tertiary structure of the ribonuclease P ribozyme-substrate complex by site-specific photoaffinity crosslinking. , 1997, RNA.

[30]  P. Svoboda,et al.  Hairpin RNA: a secondary structure of primary importance , 2006, Cellular and Molecular Life Sciences CMLS.

[31]  E. Westhof,et al.  Geometric nomenclature and classification of RNA base pairs. , 2001, RNA.

[32]  Christian Zwieb,et al.  The Signal Recognition Particle Database (SRPDB) , 1998, Nucleic Acids Res..

[33]  Pengyu Y. Ren,et al.  Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding. , 2009, Journal of molecular biology.

[34]  T. Schlick,et al.  RAG: RNA-As-Graphs database—concepts, analysis, and features , 1987 .

[35]  Kevin P. Murphy,et al.  Efficient parameter estimation for RNA secondary structure prediction , 2007, ISMB/ECCB.

[36]  David H Mathews,et al.  Predicting helical coaxial stacking in RNA multibranch loops. , 2007, RNA.

[37]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[39]  A. Datta,et al.  Heuristic RNA pseudoknot prediction including intramolecular kissing hairpins. , 2011, RNA.

[40]  Rob Knight,et al.  Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. , 2006, RNA.

[41]  G. Āllport The Psycho-Biology of Language. , 1936 .

[42]  R. Knight,et al.  From knotted to nested RNA structures: a variety of computational methods for pseudoknot removal. , 2008, RNA.

[43]  Pengyu Y. Ren,et al.  Statistical potentials for hairpin and internal loops improve the accuracy of the predicted RNA structure. , 2011, Journal of molecular biology.

[44]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[45]  T. Steitz,et al.  Metals, Motifs, and Recognition in the Crystal Structure of a 5S rRNA Domain , 1997, Cell.

[46]  Christian Zwieb,et al.  The tmRDB and SRPDB resources , 2005, Nucleic Acids Res..

[47]  H. Bussemaker,et al.  DSSR: an integrated software tool for dissecting the spatial structure of RNA , 2015, Nucleic acids research.

[48]  Namhee Kim,et al.  RAG: An update to the RNA-As-Graphs resource , 2011, BMC Bioinformatics.

[49]  D. Turner,et al.  Structural features of a six-nucleotide RNA hairpin loop found in ribosomal RNA. , 1996, Biochemistry.

[50]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[51]  Peter F. Stadler,et al.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes , 2008, Nucleic Acids Res..

[52]  James W. Brown The ribonuclease P database , 1998, Nucleic Acids Res..

[53]  C. Kundrot,et al.  Crystal Structure of a Group I Ribozyme Domain: Principles of RNA Packing , 1996, Science.

[54]  Yann Ponty,et al.  VARNA: Interactive drawing and editing of the RNA secondary structure , 2009, Bioinform..