Predicting conserved hairpin motifs in unaligned RNA sequences

Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. The algorithm we present searches for regions of the sequences that, according to base pairing rules, can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, like those forming RNA hairpins. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.

[1]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[2]  Daniel Gautheret,et al.  An RNA pattern matching program with enhanced performance and portability , 1994, Comput. Appl. Biosci..

[3]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[4]  H. Lütcke Signal recognition particle (SRP), a ubiquitous initiator of protein translocation. , 1995, European journal of biochemistry.

[5]  M. Hentze,et al.  Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[8]  Gary D. Stormo,et al.  Finding Common Sequence and Structure Motifs in a Set of RNA Sequences , 1997, ISMB.

[9]  N. Gray,et al.  Control of translation initiation in animals. , 1998, Annual review of cell and developmental biology.

[10]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[11]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[12]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[13]  Henry Soldano,et al.  A new method to predict the consensus secondary structure of a set of unaligned RNA sequences , 1999, Bioinform..

[14]  J. Parsch,et al.  Comparative sequence analysis and patterns of covariation in RNA secondary structures. , 2000, Genetics.

[15]  K. Dill,et al.  RNA folding energy landscapes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs , 2000, Nucleic Acids Res..

[17]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[18]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[19]  Moritz G. Maaß Linear Bidirectional On-Line Construction of Affix Trees , 2000, CPM.

[20]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[21]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[22]  Christian Zwieb,et al.  SRPDB (Signal Recognition Particle Database) , 2001, Nucleic Acids Res..

[23]  P. Stadler,et al.  Conserved RNA secondary structures in Picornaviridae genomes. , 2001, Nucleic acids research.

[24]  Mireille Régnier,et al.  Automatic RNA Secondary Structure Prediction with a Comparative Approach , 2002, Comput. Chem..

[25]  V. W. Porto,et al.  Discovery of RNA structural elements using evolutionary computation. , 2002, Nucleic acids research.

[26]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[27]  Wei-Min Liu,et al.  A data mining approach to discover unusual folding regions in genome sequences , 2002, Knowl. Based Syst..

[28]  Yuh-Jyh Hu Prediction of consensus structural motifs in a family of coregulated RNA sequences. , 2002, Nucleic acids research.

[29]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002 , 2002, Nucleic Acids Res..

[30]  Jih-Hsiang Chen,et al.  Discovering well-ordered folding patterns in nucleotide sequences , 2003, Bioinform..

[31]  Giancarlo Mauri,et al.  Pattern Discovery in RNA Secondary Structure Using Affix Trees , 2003, CPM.

[32]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..