Identifying Coevolving Partners from Paralogous Gene Families

Many methods have been developed to detect coevolution from aligned sequences. However, all the existing methods require a one-to-one mapping of candidate coevolving partners (nucleotides, amino acids) a priori. When two families of sequences have distinct duplication and loss histories, finding the one-to-one mapping of coevolving partners can be computationally involved. We propose an algorithm to identify the coevolving partners from two families of sequences with distinct phylogenetic trees. The algorithm maps each gene tree to a reference species tree, and builds a joint state of sequence composition and assignments of coevolving partners for each species tree node. By applying dynamic programming on the joint states, the optimal assignments can be identified. Time complexity is quadratic to the size of the species tree, and space complexity is exponential to the maximum number of gene tree nodes mapped to the same species tree node. Analysis on both simulated data and Pfam protein domain sequences demonstrates that the paralog coevolution algorithm picks up the coevolving partners with 60%–88% accuracy. This algorithm extends phylogeny-based coevolutionary models and make them applicable to a wide range of problems such as predicting protein-protein, protein-DNA and DNA-RNA interactions of two distinct families of sequences.

[1]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[2]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[3]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[4]  Simon A. A. Travers,et al.  A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.

[5]  Matthew J. Betts,et al.  Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony , 2006, Journal of Molecular Evolution.

[6]  A. Rzhetsky Estimating substitution rates in ribosomal RNA genes. , 1995, Genetics.

[7]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[8]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.

[9]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[10]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[11]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[12]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[13]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[14]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[15]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[16]  David Haussler,et al.  Detecting the coevolution of biosequences--an example of RNA interaction prediction. , 2007, Molecular biology and evolution.

[17]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[18]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[19]  A. Jean-Marie,et al.  A model-based approach for detecting coevolving positions in a molecule. , 2005, Molecular biology and evolution.

[20]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[22]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[23]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[24]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[25]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[26]  R. Gutell,et al.  Higher order structure in ribosomal RNA. , 1986, The EMBO journal.

[27]  Roger B. Sidje,et al.  Expokit: a software package for computing matrix exponentials , 1998, TOMS.

[28]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[29]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[30]  Harry F Noller,et al.  RNA Structure: Reading the Ribosome , 2005, Science.

[31]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.