A Phylogenetic Approach to RNA Structure Prediction

Methods based on the Mutual Information statistic (MI methods) predict structure by looking for statistical correlations between sequence positions in a set of aligned sequences. Although MI methods are often quite effective, these methods ignore the underlying phylogenetic relationships of the sequences they analyze. Thus, they cannot distinguish between correlations due to structural interactions, and spurious correlations resulting from phylogenetic history. In this paper, we introduce a method analogous to MI that incorporates phylogenetic information. We show that this method accurately recovers the structures of well-known RNA molecules. We also demonstrate, with both real and simulated data, that this phylogenetically-based method outperforms standard MI methods, and improves the ability to distinguish interacting from non-interacting positions in RNA. This method is flexible, and may be applied to the prediction of protein structure given the appropriate evolutionary model. Because this method incorporates phylogenetic data, it also has the potential to be improved with the addition of more accurate phylogenetic information, although we show that even approximate phylogenies are helpful.

[1]  R. Gutell,et al.  Genetic and comparative analyses reveal an alternative secondary structure in the region of nt 912 of Escherichia coli 16S rRNA. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[2]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[3]  R. Gutell,et al.  Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. , 1983, Microbiological reviews.

[4]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[5]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[6]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[7]  G. Stormo,et al.  Correlated mutations in protein sequences: Phylogenetic and structural effects , 1997 .

[8]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[9]  S. Muse Evolutionary analyses of DNA sequences subject to constraints of secondary structure. , 1995, Genetics.

[10]  Gary D. Stormo,et al.  Graph-Theoretic Approach to RNA Modeling Using Comparative Data , 1995, ISMB.

[11]  E. Westhof,et al.  Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. , 1990, Journal of molecular biology.

[12]  D. Haussler,et al.  Using multiple alignments and phylogenetic trees to detect RNA secondary structure. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  K Lange,et al.  Computational advances in maximum likelihood methods for molecular phylogeny. , 1998, Genome research.

[14]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[15]  R. Gutell,et al.  Collection of small subunit (16S- and 16S-like) ribosomal RNA structures: 1994. , 1993, Nucleic acids research.

[16]  Yves Van de Peer,et al.  Database on the structure of small ribosomal subunit RNA , 1998, Nucleic Acids Res..

[17]  Ross A. Overbeek,et al.  Structure detection through automated covariance search , 1990, Comput. Appl. Biosci..

[18]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[19]  Yves Van de Peer,et al.  Database on the structure of small ribosomal subunit RNA , 1996, Nucleic Acids Res..

[20]  I. Tinoco,et al.  Structural Elements in RNA , 1991, Progress in Nucleic Acid Research and Molecular Biology.