A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences.

MOTIVATION RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS We present a new method for predicting common RNA secondary structure motifs in a set of functionally or evolutionarily related RNA sequences. This method is based on comparison of stems (palindromic helices) between sequences and is implemented by applying graph-theoretical approaches. It first finds all possible stable stems in each sequence and compares stems pairwise between sequences by some defined features to find stems conserved across any two sequences. Then by applying a maximum clique finding algorithm, it finds all significant stems conserved across at least k sequences. Finally, it assembles in topological order all possible compatible conserved stems shared by at least k sequences and reports a number of the best assembled stem sets as the best candidate common structure motifs. This method does not require prior structural alignment of the sequences and is able to detect pseudoknot structures. We have tested this approach on some RNA sequences with known secondary structures, in which it is capable of detecting the real structures completely or partially correctly and outperforms other existing programs for similar purposes. AVAILABILITY The algorithm has been implemented in C++ in a program called comRNA, which is available at http://ural.wustl.edu/softwares.html

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  I. Tinoco,et al.  Estimation of Secondary Structure in Ribonucleic Acids , 1971, Nature.

[3]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[4]  D. Turner,et al.  Improved free-energy parameters for predictions of RNA duplex stability. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[5]  N. Pace,et al.  Phylogenetic comparative analysis of RNA secondary structure. , 1989, Methods in enzymology.

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[8]  D. Draper,et al.  Allosteric mechanism for translational repression in the Escherichia coli alpha operon. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[10]  Daniel Gautheret,et al.  An RNA pattern matching program with enhanced performance and portability , 1994, Comput. Appl. Biosci..

[11]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[12]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[13]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Zuker Prediction of RNA secondary structure by energy minimization. , 1994, Methods in molecular biology.

[15]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[16]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[17]  A. Viari,et al.  Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. , 1996, Nucleic acids research.

[18]  David S. Johnson,et al.  Cliques, Coloring, and Satisfiability , 1996 .

[19]  C. Ehresmann,et al.  Pseudoknot and translational control in the expression of the S15 ribosomal protein , 1996, Biochimie.

[20]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[21]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[22]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[23]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[24]  David R. Gilbert,et al.  Motif-based searching in TOPS protein topology databases , 1999, Bioinform..

[25]  Henry Soldano,et al.  A new method to predict the consensus secondary structure of a set of unaligned RNA sequences , 1999, Bioinform..

[26]  S. Le,et al.  Prediction of common secondary structures of RNAs: a genetic algorithm approach. , 2000, Nucleic acids research.

[27]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[28]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[29]  F. H. D. van Batenburg,et al.  PseudoBase: structural information on RNA pseudoknots , 2001, Nucleic Acids Res..

[30]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[31]  C. Gissi,et al.  Structural and functional features of eukaryotic mRNA untranslated regions. , 2001, Gene.

[32]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[33]  Gary D. Stormo,et al.  Do mRNAs act as direct sensors of small molecules to control their expression? , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Tore Grünert,et al.  Finding all k-cliques in k-partite graphs, an application in textile engineering , 2002, Comput. Oper. Res..

[35]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.

[36]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[37]  David R. Gilbert,et al.  A Computer System to Perform Structure Comparison using Representations of Protein Structure , 2002, Comput. Chem..

[38]  A. Serganov,et al.  Do mRNA and rRNA binding sites of E.coli ribosomal protein S15 share common structural determinants? , 2002, Journal of molecular biology.

[39]  Yuh-Jyh Hu Prediction of consensus structural motifs in a family of coregulated RNA sequences. , 2002, Nucleic acids research.

[40]  Hélène Touzet,et al.  Finding the common structure shared by two homologous RNAs , 2003, Bioinform..

[41]  Niles A. Pierce,et al.  A partition function algorithm for nucleic acid secondary structure including pseudoknots , 2003, J. Comput. Chem..

[42]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[43]  Jeffrey E. Barrick,et al.  Riboswitches Control Fundamental Biochemical Pathways in Bacillus subtilis and Other Bacteria , 2003, Cell.