Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences

MOTIVATION With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure-function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure. RESULTS A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences. AVAILABILITY The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu.

[1]  T. Steitz,et al.  The structural basis of ribosome activity in peptide bond synthesis. , 2000, Science.

[2]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[3]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[4]  Christian Zwieb,et al.  The signal recognition particle database (SRPDB) , 1993, Nucleic Acids Res..

[5]  R. Gutell,et al.  Collection of small subunit (16S- and 16S-like) ribosomal RNA structures: 1994. , 1993, Nucleic acids research.

[6]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[7]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[8]  Anne Condon,et al.  Computational RNA secondary structure design: empirical complexity and improved methods , 2007, BMC Bioinformatics.

[9]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[10]  Paul F Agris,et al.  Anticodon domain modifications contribute order to tRNA for ribosome-mediated codon binding. , 2008, Biochemistry.

[11]  R. Gutell,et al.  The accuracy of ribosomal RNA comparative structure models. , 2002, Current opinion in structural biology.

[12]  D. Turner,et al.  Secondary structure model of the RNA recognized by the reverse transcriptase from the R2 retrotransposable element. , 1997, RNA.

[13]  David H Mathews,et al.  Prediction of RNA secondary structure by free energy minimization. , 2006, Current opinion in structural biology.

[14]  Gaurav Sharma,et al.  Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign , 2007, BMC Bioinformatics.

[15]  Marcel Turcotte,et al.  Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction? , 2007, BMC Bioinformatics.

[16]  Marcel Turcotte,et al.  Simultaneous alignment and structure prediction of three RNA sequences , 2005, Int. J. Bioinform. Res. Appl..

[17]  Paul P. Gardner,et al.  MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing , 2007, Bioinform..

[18]  Maciej Szymanski,et al.  5S Ribosomal RNA Data Bank , 1999, Nucleic Acids Res..

[19]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[20]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[21]  Stefan L Ameres,et al.  The impact of target site accessibility on the design of effective siRNAs , 2008, Nature Biotechnology.

[22]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[23]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[24]  R. Batey,et al.  Structures of regulatory elements in mRNAs. , 2006, Current opinion in structural biology.

[25]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[26]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[27]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[28]  David H. Mathews,et al.  Predicting a set of minimal free energy RNA secondary structures common to two sequences , 2005, Bioinform..

[29]  Xing Xu,et al.  RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment , 2007, Bioinform..

[30]  J. M. Diamond,et al.  Thermodynamics of three-way multibranch loops in RNA. , 2001, Biochemistry.

[31]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[32]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[33]  Tamás Kiss,et al.  Site-Specific Ribose Methylation of Preribosomal RNA: A Novel Function for Small Nucleolar RNAs , 1996, Cell.

[34]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[35]  W. Raub From the National Institutes of Health. , 1990, JAMA.

[36]  D. Mathews,et al.  PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction , 2008, Nucleic acids research.

[37]  Robin Ray Gutell,et al.  Collection of small subunit (16S- and 16S-like) ribosomal RNA structures , 1993, Nucleic Acids Res..

[38]  David H. Mathews,et al.  Efficient siRNA selection using hybridization thermodynamics , 2007, Nucleic acids research.

[39]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[40]  Christian Zwieb,et al.  The Signal Recognition Particle Database (SRPDB) , 1993, Nucleic Acids Res..

[41]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[42]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[43]  D. Mathews Predicting RNA secondary structure by free energy minimization , 2006 .

[44]  I. Hofacker,et al.  From consensus structure prediction to RNA gene finding. , 2009, Briefings in functional genomics & proteomics.

[45]  Kristin Reiche,et al.  The primary transcriptome of the major human pathogen Helicobacter pylori , 2010, Nature.

[46]  Ignacio Tinoco,et al.  Real-time control of the energy landscape by force directs the folding of RNA molecules , 2007, Proceedings of the National Academy of Sciences.

[47]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[48]  Dang D. Long,et al.  Potent effect of target structure on microRNA function , 2007, Nature Structural &Molecular Biology.

[49]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[50]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[51]  J. Gorodkin,et al.  Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. , 2006, Genome research.

[52]  Yasuo Tabei,et al.  Murlet: a practical multiple alignment tool for structural RNA sequences , 2007, Bioinform..

[53]  D. Turner,et al.  Experimentally derived nearest-neighbor parameters for the stability of RNA three- and four-way multibranch loops. , 2002, Biochemistry.

[54]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[55]  James R. Williamson,et al.  The catalytic diversity of RNAs , 2005, Nature Reviews Molecular Cell Biology.

[56]  James W. Brown The ribonuclease P database , 1997, Nucleic Acids Res..

[57]  Chuan-Sheng Foo,et al.  A max-margin model for efficient simultaneous alignment and folding of RNA sequences , 2008, ISMB.

[58]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[59]  J. Mattick,et al.  Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. , 2005, Genome research.

[60]  References , 1971 .

[61]  Robert M. Dirks,et al.  Paradigms for computational nucleic acid design. , 2004, Nucleic acids research.