An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

The recent interest sparked due to the discovery of a variety of functions for non-coding RNA molecules has highlighted the need for suitable tools for the analysis and the comparison of RNA sequences. Many trans-acting non-coding RNA genes and cis-acting RNA regulatory elements present motifs, conserved both in structure and sequence, that can be hardly detected by primary sequence analysis alone. We present an algorithm that takes as input a set of unaligned RNA sequences expected to share a common motif, and outputs the regions that are most conserved throughout the sequences, according to a similarity measure that takes into account both the sequence of the regions and the secondary structure they can form according to base-pairing and thermodynamic rules. Only a single parameter is needed as input, which denotes the number of distinct hairpins the motif has to contain. No further constraints on the size, number and position of the single elements comprising the motif are required. The algorithm can be split into two parts: first, it extracts from each input sequence a set of candidate regions whose predicted optimal secondary structure contains the number of hairpins given as input. Then, the regions selected are compared with each other to find the groups of most similar ones, formed by a region taken from each sequence. To avoid exhaustive enumeration of the search space and to reduce the execution time, a greedy heuristic is introduced for this task. We present different experiments, which show that the algorithm is capable of characterizing and discovering known regulatory motifs in mRNA like the iron responsive element (IRE) and selenocysteine insertion sequence (SECIS) stem-loop structures. We also show how it can be applied to corrupted datasets in which a motif does not appear in all the input sequences, as well as to the discovery of more complex motifs in the non-coding RNA.

[1]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[2]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[3]  Daniel Gautheret,et al.  An RNA pattern matching program with enhanced performance and portability , 1994, Comput. Appl. Biosci..

[4]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. Hentze,et al.  Finding the hairpin in the haystack: searching for RNA motifs. , 1995, Trends in genetics : TIG.

[6]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[7]  M. Berry,et al.  Knowing when not to stop: selenocysteine incorporation in eukaryotes. , 1996, Trends in biochemical sciences.

[8]  J. Harney,et al.  Selenocysteine incorporation in eukaryotes: insights into mechanism and efficiency from sequence, structure, and spacing proximity studies of the type 1 deiodinase SECIS element. , 1996, RNA.

[9]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[10]  James W. Brown The ribonuclease P database , 1997, Nucleic Acids Res..

[11]  N. Gray,et al.  Control of translation initiation in animals. , 1998, Annual review of cell and developmental biology.

[12]  N. Pace,et al.  Ribonuclease P: unity and diversity in a tRNA processing ribozyme. , 1998, Annual review of biochemistry.

[13]  Theil Ec,et al.  The iron responsive element (IRE) family of mRNA regulators. Regulation of iron transport and uptake compared in animals, plants, and microorganisms. , 1998 .

[14]  Elizabeth C. Theil,et al.  Iron regulatory element and internal loop/bulge structure for ferritin mRNA studied by cobalt(III) hexammine binding, molecular modeling, and NMR spectroscopy. , 1998, Biochemistry.

[15]  M. Huynen,et al.  Automatic detection of conserved RNA structure elements in complete RNA virus genomes. , 1998, Nucleic acids research.

[16]  Christian N. S. Pedersen,et al.  Fast evaluation of internal loops in RNA secondary structure prediction , 1999, Bioinform..

[17]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[18]  G. Kryukov,et al.  New Mammalian Selenocysteine-containing Proteins Identified with an Algorithm That Searches for Selenocysteine Insertion Sequence Elements* , 1999, The Journal of Biological Chemistry.

[19]  D Gautheret,et al.  Novel Selenoproteins Identified in Silico andin Vivo by Using a Conserved RNA Structural Motif* , 1999, The Journal of Biological Chemistry.

[20]  T. Huang,et al.  An atypical iron-responsive element (IRE) within crayfish ferritin mRNA and an iron regulatory protein 1 (IRP1)-like protein from crayfish hepatopancreas. , 1999, Insect biochemistry and molecular biology.

[21]  Elizabeth C. Theil,et al.  Internal loop/bulge and hairpin loop of the iron-responsive element of ferritin mRNA contribute to maximal iron regulatory protein 2 binding and translational regulation in the iso-iron-responsive element/iso-iron regulatory protein family. , 2000, Biochemistry.

[22]  N. Pace,et al.  Phylogenetic-comparative analysis of the eukaryal ribonuclease P RNA. , 2000, RNA.

[23]  K. Dill,et al.  RNA folding energy landscapes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs , 2000, Nucleic Acids Res..

[25]  P. Carbon,et al.  Structural analysis of new local features in SECIS RNA hairpins. , 2000, Nucleic acids research.

[26]  E. R. Gavis,et al.  Overlapping but distinct RNA elements control repression and activation of nanos translation. , 2000, Molecular cell.

[27]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[28]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[29]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[30]  R. Guigó,et al.  In silico identification of novel selenoproteins in the Drosophila melanogaster genome , 2001, EMBO reports.

[31]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[32]  Christian Zwieb,et al.  SRPDB (Signal Recognition Particle Database) , 2001, Nucleic Acids Res..

[33]  P. Stadler,et al.  Conserved RNA secondary structures in Picornaviridae genomes. , 2001, Nucleic acids research.

[34]  V. W. Porto,et al.  Discovery of RNA structural elements using evolutionary computation. , 2002, Nucleic acids research.

[35]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[36]  Vadim N. Gladyshev,et al.  Mammalian Selenoprotein in Which Selenocysteine (Sec) Incorporation Is Supported by a New Form of Sec Insertion Sequence Element , 2002, Molecular and Cellular Biology.

[37]  C. Gissi,et al.  Untranslated regions of mRNAs , 2002, Genome Biology.

[38]  A. Hüttenhofer,et al.  RNomics: identification and function of small, non-messenger RNAs. , 2002, Current opinion in chemical biology.

[39]  D. Engelke,et al.  Eukaryotic ribonuclease P: a plurality of ribonucleoprotein enzymes. , 2002, Annual review of biochemistry.

[40]  Yuh-Jyh Hu Prediction of consensus structural motifs in a family of coregulated RNA sequences. , 2002, Nucleic acids research.

[41]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002 , 2002, Nucleic Acids Res..

[42]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[43]  Hélène Touzet,et al.  Finding the common structure shared by two homologous RNAs , 2003, Bioinform..

[44]  Sophie Bonnal,et al.  IRESdb: the Internal Ribosome Entry Site database , 2003, Nucleic Acids Res..

[45]  R. Guigó,et al.  Characterization of Mammalian Selenoproteomes , 2003, Science.

[46]  Maciej Szymanski,et al.  Noncoding regulatory RNAs database , 2003, Nucleic Acids Res..

[47]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[48]  Paul D. Shaw,et al.  Plant snoRNA database , 2003, Nucleic Acids Res..

[49]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..

[50]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[51]  Graziano Pesole,et al.  PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences , 2003, Nucleic Acids Res..

[52]  L. Chavatte,et al.  Finding needles in a haystack , 2004 .