Strategies for measuring evolutionary conservation of RNA secondary structures

BackgroundEvolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential.ResultsWe systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons.ConclusionStructural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.

[1]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[2]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[3]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[4]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[5]  P. Schuster,et al.  Statistics of RNA secondary structures , 1993, Biopolymers.

[6]  R. Gutell,et al.  The accuracy of ribosomal RNA comparative structure models. , 2002, Current opinion in structural biology.

[7]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[8]  David H Mathews,et al.  Prediction of RNA secondary structure by free energy minimization. , 2006, Current opinion in structural biology.

[9]  Paul P. Gardner,et al.  Sequence analysis Measuring covariation in RNA alignments : physical realism improves information measures , 2006 .

[10]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[11]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[12]  M. Lindauer [The origin of life on this earth]. , 1985, Krankenpflege Journal.

[13]  B. Barrell,et al.  Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum. , 2008, Genome research.

[14]  Paulien Hogeweg,et al.  Energy directed folding of RNA sequences , 1984, Nucleic Acids Res..

[15]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[16]  Peter Sestoft,et al.  Semiautomated improvement of RNA alignments. , 2007, RNA.

[17]  Mike A. Steel,et al.  Metrics on RNA Secondary Structures , 2000, J. Comput. Biol..

[18]  E. Pennisi Human genome. A low number wins the GeneSweep Pool. , 2003, Science.

[19]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[20]  Sonja J. Prohaska,et al.  Evolutionary patterns of non-coding RNAs , 2005, Theory in Biosciences.

[21]  Julien Allali,et al.  A new distance for high level RNA secondary structure comparison , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[23]  M. Waterman Secondary Structure of Single-Stranded Nucleic Acidst , 1978 .

[24]  Vincent Moulton,et al.  A Search for H/ACA SnoRNAs in Yeast Using MFE Secondary Structure Prediction , 2003, Bioinform..

[25]  J. Gorodkin,et al.  Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. , 2006, Genome research.

[26]  Na Liu,et al.  A method for rapid similarity analysis of RNA secondary structures , 2006, BMC Bioinformatics.

[27]  P. Stadler,et al.  Prediction of structured non-coding RNAs in the genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. , 2006, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[28]  A. Coulson,et al.  Genomics in C. elegans: so many genes, such a little worm. , 2005, Genome research.

[29]  Robert Giegerich,et al.  Pure multiple RNA secondary structure alignments: a progressive profile approach , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Eric Westhof,et al.  The non-Watson-Crick base pairs and their associated isostericity matrices. , 2002, Nucleic acids research.

[31]  N. Pace,et al.  Phylogenetic comparative analysis and the secondary structure of ribonuclease P RNA--a review. , 1989, Gene.

[32]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[33]  Jennifer Couzin,et al.  Small RNAs Make Big Splash , 2002, Science.

[34]  Peter F. Stadler,et al.  Automatic Detection of Conserved Base Pairing Patterns in RNA Virus Genomes , 1998, Comput. Chem..

[35]  Ivo L. Hofacker,et al.  The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures , 2007, Nucleic Acids Res..

[36]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[37]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[38]  F. Crick Central Dogma of Molecular Biology , 1970, Nature.

[39]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[40]  M. Gerstein,et al.  Structured Rnas in the Encode Selected Regions of the Human Genome , 2022 .

[41]  J. Mattick Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[42]  Zasha Weinberg,et al.  CMfinder - a covariance model based RNA motif finding algorithm , 2006, Bioinform..

[43]  Alan S. Perelson,et al.  Base Pairing Probabilities in a Complete HIV-1 RNA , 1996, J. Comput. Biol..

[44]  Thomas Dandekar,et al.  Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures. , 2005, RNA.

[45]  N. Pace,et al.  The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme , 1983, Cell.

[46]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[47]  R. Giegerich,et al.  Complete probabilistic analysis of RNA shapes , 2006, BMC Biology.

[48]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[49]  C. Gissi,et al.  Untranslated regions of mRNAs , 2002, Genome Biology.

[50]  Van Trung Nguyen,et al.  Binding of the 7SK snRNA turns the HEXIM1 protein into a P‐TEFb (CDK9/cyclin T) inhibitor , 2004, The EMBO journal.

[51]  Julien Allali,et al.  A Multiple Graph Layers Model with Application to RNA Secondary Structures Comparison , 2005, SPIRE.

[52]  D. Perkins,et al.  Expanding the ‘central dogma’: the regulatory role of nonprotein coding genes and implications for the genetic liability to schizophrenia , 2005, Molecular Psychiatry.

[53]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[54]  Vincent Moulton,et al.  A comparison of RNA folding measures , 2005, BMC Bioinformatics.

[55]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[56]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[57]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[58]  Gill Bejerano,et al.  Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. , 2005, Genome research.

[59]  P. Schuster,et al.  RNA folding at elementary step resolution. , 1999, RNA.

[60]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[61]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[62]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[63]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[64]  A. Hüttenhofer,et al.  The expanding snoRNA world. , 2002, Biochimie.

[65]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[66]  T. Cech,et al.  In vitro splicing of the ribosomal RNA precursor of tetrahymena: Involvement of a guanosine nucleotide in the excision of the intervening sequence , 1981, Cell.

[67]  Jonathan P. Bollback,et al.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. , 2006, Genome research.

[68]  Timothy R. Hughes,et al.  Considerations in the identification of functional RNA structural elements in genomic alignments , 2007, BMC Bioinformatics.

[69]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[70]  Ivo L Hofacker,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2006, Genome informatics. International Conference on Genome Informatics.

[71]  M. Huynen,et al.  Automatic detection of conserved RNA structure elements in complete RNA virus genomes. , 1998, Nucleic acids research.

[72]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[73]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[74]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[75]  M. Waterman,et al.  RNA secondary structure: a complete mathematical analysis , 1978 .

[76]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[77]  E. Schadt,et al.  Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. , 2005, Trends in genetics : TIG.

[78]  A. W. Karzai,et al.  Trans-translation: the tmRNA-mediated surveillance mechanism for ribosome rescue, directed protein degradation, and nonstop mRNA decay. , 2007, Biochemistry.

[79]  P. Stadler,et al.  Design of multistable RNA molecules. , 2001, RNA.

[80]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[81]  E. Westhof,et al.  Geometric nomenclature and classification of RNA base pairs. , 2001, RNA.

[82]  P. Schuster,et al.  RNA multi-structure landscapes , 1993, European Biophysics Journal.

[83]  W. Gilbert Origin of life: The RNA world , 1986, Nature.

[84]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[85]  Sonja J. Prohaska,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2007, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[86]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[87]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[88]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[89]  Chang-Zheng Chen,et al.  MicroRNAs as oncogenes and tumor suppressors. , 2005, The New England journal of medicine.

[90]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[91]  Daniel J. Blankenberg,et al.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser. , 2007, Genome research.

[92]  Zasha Weinberg,et al.  A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes , 2007, PLoS Comput. Biol..

[93]  Mark Gerstein,et al.  Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human , 2007, BMC Evolutionary Biology.

[94]  Vincent Moulton,et al.  Use of RNA Secondary Structure for Studying the Evolution of RNase P and RNase MRP , 2000, Journal of Molecular Evolution.

[95]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[96]  Sidney W. Fox,et al.  Biological replication of macromolecules , 1959 .

[97]  Gustavo Caetano-Anollés,et al.  Evolved RNA Secondary Structure and the Rooting of the Universal Tree of Life , 2002, Journal of Molecular Evolution.

[98]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[99]  Andreas Wilm,et al.  An enhanced RNA alignment benchmark for sequence alignment programs , 2006, Algorithms for Molecular Biology.

[100]  Peter F. Stadler,et al.  Non-coding RNAs in Ciona intestinalis , 2005, ECCB/JBI.

[101]  Ian Holmes,et al.  A probabilistic model for the evolution of RNA structure , 2004, BMC Bioinformatics.

[102]  Kay C. Wiese,et al.  jViz.Rna - An Interactive Graphical Tool for Visualizing RNA Secondary Structure Including Pseudoknots , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[103]  Dmitri A. Nusinow,et al.  Xist RNA and the mechanism of X chromosome inactivation. , 2002, Annual review of genetics.

[104]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[105]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[106]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[107]  Sonja J. Prohaska,et al.  Computational RNomics of Drosophilids , 2007, BMC Genomics.

[108]  Reed A. Cartwright,et al.  DNA assembly with gaps (Dawg): simulating sequence evolution , 2005, Bioinform..

[109]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[110]  Bruce A. Shapiro,et al.  An algorithm for comparing multiple RNA secondary structures , 1988, Comput. Appl. Biosci..

[111]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[112]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[113]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[114]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[115]  Shane J. Neph,et al.  Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline , 2007, Nucleic acids research.

[116]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[117]  Kay Nieselt,et al.  Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions , 2007, BMC Biology.

[118]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.