A new method to predict the consensus secondary structure of a set of unaligned RNA sequences

MOTIVATION To predict the consensus secondary structure, possibly including pseudoknots, of a set of RNA unaligned sequences. RESULTS We have designed a method based on a new representation of any RNA secondary structure as a set of structural relationships between the helices of the structure. We refer to this representation as a structural pattern. In a first step, we use thermodynamic parameters to select, for each sequence, the best secondary structures according to energy minimization and we represent each of them using its corresponding structural pattern. In a second step, we search for the repeated structural patterns, i.e. the largest structural patterns that occur in at least one sequence, i.e. included in at least one of the structural patterns associated to each sequence. Thanks to an efficient encoding of structural patterns, this search comes down to identifying the largest repeated word suffixes in a dictionary. In a third step, we compute the plausibility of each repeated structural pattern by checking if it occurs more frequently in the studied sequences than in random RNA sequences. We then suppose that the consensus secondary structure corresponds to the repeated structural pattern that displays the highest plausibility. We present several experiments concerning tRNA, fragments of 16S rRNA and 10Sa RNA (including pseudoknots); in each of them, we found the putative consensus secondary structure.

[1]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[2]  H. M. Martinez,et al.  An efficient method for finding repeats in molecular sequences , 1983, Nucleic Acids Res..

[3]  M. Waterman,et al.  RNA secondary structure: a complete mathematical analysis , 1978 .

[4]  C R Woese,et al.  Evidence for several higher order structural elements in ribosomal RNA. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Alain Viari,et al.  Searching for flexible repeated patterns using a non-transitive similarity relation , 1995, Pattern Recognit. Lett..

[6]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[7]  W. Salser Globin mRNA sequences: analysis of base pairing and evolutionary implications. , 1978, Cold Spring Harbor symposia on quantitative biology.

[8]  A. Viari,et al.  Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. , 1996, Nucleic acids research.

[9]  Dominique Bouthinon Apprentissage à partir d'exemples ambigus : étude théorique et application à la découverte de structures communes à un ensemble de séquences d'ARN , 1996 .

[10]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[11]  R. Gutell,et al.  Higher order structure in ribosomal RNA. , 1986, The EMBO journal.

[12]  Gary D. Stormo,et al.  Automated Alignment of RNA Sequences to Pseudoknotted Structures , 1997, ISMB.

[13]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[14]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[15]  Li Wuju,et al.  Prediction of RNA secondary structure based on helical regions distribution , 1998, Bioinform..

[16]  M. Bishop,et al.  Nucleic acid and protein sequence analysis : a practical approach , 1987 .

[17]  H. M. Martinez,et al.  An RNA secondary structure workbench. , 1988, Nucleic acids research.

[18]  J. Ninio Prediction of pairing schemes in RNA molecules-loop contributions and energy of wobble and non-wobble pairs. , 1980, Biochimie.

[19]  Henry Soldano,et al.  An Inductive Logic Programming Framework to Learn a Concept from Ambiguous Examples , 1998, ECML.

[20]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[21]  Leslie Grate,et al.  Automatic RNA Secondary Structure Determination with Stochastic Context-Free Grammars , 1995, ISMB.

[22]  James R. Cole,et al.  Alignment of possible secondary structures in multiple RNA sequences using simulated annealing , 1996, Comput. Appl. Biosci..

[23]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[24]  Manolo Gouy,et al.  An energy model that predicts the correct folding of both the tRNA and the 5S RNA molecules , 1984, Nucleic Acids Res..

[25]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Akinori Yonezawa,et al.  RNA secondary structure prediction using highly parallel computers , 1995, Comput. Appl. Biosci..

[27]  M. Zuker Computer prediction of RNA structure. , 1989, Methods in enzymology.

[28]  Manolo Gouy,et al.  Prédiction des structures secondaires dans les acides nucléiques: aspects algorithmiques et physiques , 1985 .

[29]  D. Turner,et al.  A comparison of optimal and suboptimal RNA secondary structures predicted by free energy minimization with structures determined by phylogenetic comparison. , 1991, Nucleic acids research.

[30]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[31]  Dominique Bouthinon,et al.  Un cadre de programmation logique inductive pour apprendre un concept à partir d'exemples ambigus , 1997 .

[32]  Kyungsook Han,et al.  Prediction of common folding structures of homologous RNAs. , 1993, Nucleic acids research.

[33]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[34]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[35]  D. Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[36]  J. Pipas,et al.  Method for predicting RNA secondary structure. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[37]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[38]  H. Noller,et al.  Model for the three-dimensional folding of 16 S ribosomal RNA. , 1988, Journal of molecular biology.

[39]  M. Waterman Mathematical Methods for DNA Sequences , 1989 .

[40]  Jin Chu Wu,et al.  Predicting RNA H-type pseudoknots with the massively parallel genetic algorithm , 1997, Comput. Appl. Biosci..

[41]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[42]  C. J. Rawlings ISMB-95 : proceedings : third International Conference on Intelligent Systems for Molecular Biology , 1995 .

[43]  J. P. Dumas,et al.  Efficient algorithms for folding and comparing nucleic acid sequences , 1982, Nucleic Acids Res..

[44]  R. Gutell,et al.  Additional Watson-Crick interactions suggest a structural core in large subunit ribosomal RNA. , 1989, Journal of biomolecular structure & dynamics.