Detecting the coevolution of biosequences--an example of RNA interaction prediction.

A probabilistic graphical model is proposed in order to detect the coevolution between different sites in biological sequences. The model extends the continuous-time Markov process of sequence substitution for single nucleic or amino acids and imposes general constraints regarding simultaneous changes on the substitution rate matrix. Given a multiple sequence alignment for each molecule of interest and a phylogenetic tree, the model can predict potential interactions within or between nucleic acids and proteins. Initial validation of the model is carried out using tRNA and 16S rRNA sequence data. The model accurately identifies the secondary interactions of tRNA as well as several known tertiary interactions. In addition, results on 16S rRNA data indicate this general and simple coevolutionary model outperforms several other parametric and nonparametric methods in predicting secondary interactions. Furthermore, the majority of the putative predictions exhibit either direct contact or proximity of the nucleotide pairs in the 3-dimensional structure of the Thermus thermophilus ribosomal small subunit. The results on RNA data suggest a general model of coevolution might be applied to other types of interactions between protein, DNA, and RNA molecules.

[1]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[2]  David Haussler,et al.  Combining phylogenetic and hidden Markov models in biosequence analysis , 2003, RECOMB '03.

[3]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[4]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[5]  E. Koonin,et al.  Conservation and coevolution in the scale-free human gene coexpression network. , 2004, Molecular biology and evolution.

[6]  Francisco C. Santos,et al.  Cooperation Prevails When Individuals Adjust Their Social Ties , 2006, PLoS Comput. Biol..

[7]  T. Earnest,et al.  Crystal Structure of the Ribosome at 5.5 Å Resolution , 2001, Science.

[8]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[9]  Stephen Neidle,et al.  Principles of nucleic acid structure , 2007 .

[10]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[11]  J. Holton,et al.  Structures of the Bacterial Ribosome at 3.5 Å Resolution , 2005, Science.

[12]  Harry F Noller,et al.  RNA Structure: Reading the Ribosome , 2005, Science.

[13]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[14]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[15]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[16]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[17]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[18]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[19]  C. Vonrhein,et al.  Structure of the 30S ribosomal subunit , 2000, Nature.

[20]  A. E. Hirsh,et al.  Functional genomic analysis of the rates of protein evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[22]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[23]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.

[24]  S. Eddy,et al.  Computational identification of noncoding RNAs in E. coli by comparative genomics , 2001, Current Biology.

[25]  A. Jean-Marie,et al.  A model-based approach for detecting coevolving positions in a molecule. , 2005, Molecular biology and evolution.

[26]  James R. Cole,et al.  The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data , 2006, Nucleic Acids Res..

[27]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.

[28]  C R Woese,et al.  Architecture of ribosomal RNA: constraints on the sequence of "tetra-loops". , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[29]  A. Rzhetsky Estimating substitution rates in ribosomal RNA genes. , 1995, Genetics.

[30]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[31]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[32]  I. Lapidus,et al.  Secondary structure of 5 S ribosomal RNA. , 1970, Journal of theoretical biology.

[33]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[34]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[35]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[37]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[38]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[39]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[40]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[41]  R. Gutell,et al.  Higher order structure in ribosomal RNA. , 1986, The EMBO journal.

[42]  I Holmes,et al.  An expectation maximization algorithm for training hidden substitution models. , 2002, Journal of molecular biology.

[43]  Eric Westhof,et al.  Recurrent structural RNA motifs, Isostericity Matrices and sequence alignments , 2005, Nucleic acids research.

[44]  D. Eisenberg,et al.  Use of Logic Relationships to Decipher Protein Network Organization , 2004, Science.

[45]  V. Ramakrishnan,et al.  Structure of the 30 S ribosomal subunit , 2022 .

[46]  M. Huynen,et al.  Automatic detection of conserved RNA structure elements in complete RNA virus genomes. , 1998, Nucleic acids research.

[47]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[48]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[49]  Eric Westhof,et al.  The non-Watson-Crick base pairs and their associated isostericity matrices. , 2002, Nucleic acids research.

[50]  Simon A. A. Travers,et al.  A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.