Detecting Coevolution in and among Protein Domains

Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level.

[1]  P. Cramer,et al.  Structural Basis of Transcription: RNA Polymerase II at 2.8 Ångstrom Resolution , 2001, Science.

[2]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[3]  H M Holden,et al.  Carbamoyl phosphate synthetase: closure of the B-domain as a result of nucleotide binding. , 1999, Biochemistry.

[4]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[5]  Liang Tong,et al.  Structure of a closed form of human malic enzyme and implications for catalytic mechanism , 2000, Nature Structural Biology.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  D. Dunaway-Mariano,et al.  Swiveling-domain mechanism for enzymatic phosphotransfer between remote reaction sites. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[9]  Simon A. A. Travers,et al.  A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.

[10]  Wayne F Anderson,et al.  Novel catalytic mechanism of glycoside hydrolysis based on the structure of an NAD+/Mn2+ -dependent phospho-alpha-glucosidase from Bacillus subtilis. , 2004, Structure.

[11]  B. Diner,et al.  Crystal structures of the photosystem II D1 C-terminal processing protease , 2000, Nature Structural Biology.

[12]  Raymond Cunin,et al.  Aspartate transcarbamylase from the hyperthermophilic archaeon Pyrococcus abyssi: thermostability and 1.8A resolution crystal structure of the catalytic subunit complexed with the bisubstrate analogue N-phosphonacetyl-L-aspartate. , 2003, Journal of molecular biology.

[13]  R. Gutell,et al.  Higher order structure in ribosomal RNA. , 1986, The EMBO journal.

[14]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[15]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[16]  Yunje Cho,et al.  Structural Basis for Cold Adaptation , 1999, The Journal of Biological Chemistry.

[17]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.

[18]  Galina Polekhina,et al.  Siah ubiquitin ligase is structurally related to TRAF and modulates TNF-alpha signaling. , 2002, Nature structural biology.

[19]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[20]  Christian Siebold,et al.  A mechanism of covalent substrate binding in the x-ray structure of subunit K of the Escherichia coli dihydroxyacetone kinase , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[22]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[23]  D W Heinz,et al.  High resolution crystal structure of a Mg2+-dependent porphobilinogen synthase. , 1999, Journal of molecular biology.

[24]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[25]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[26]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[27]  Wolfgang Kabsch,et al.  Structure and mechanism of the glycyl radical enzyme pyruvate formate-lyase , 1999, Nature Structural Biology.

[28]  P. Ferrari,et al.  Crystal structures of two human pyrophosphorylase isoforms in complexes with UDPGlc(Gal)NAc: role of the alternatively spliced insert in the enzyme oligomeric assembly and active site architecture , 2001, The EMBO journal.

[29]  Terri Goss Kinzy,et al.  Two crystal structures demonstrate large conformational changes in the eukaryotic ribosomal translocase , 2003, Nature Structural Biology.

[30]  Marina Meila-Predoviciu,et al.  Learning with Mixtures of Trees , 1999 .

[31]  A. Rzhetsky Estimating substitution rates in ribosomal RNA genes. , 1995, Genetics.

[32]  Gary Siuzdak,et al.  The structure of apo human glutamate dehydrogenase details subunit communication and allostery. , 2002, Journal of molecular biology.

[33]  James Barber,et al.  Architecture of the Photosynthetic Oxygen-Evolving Center , 2004, Science.

[34]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[35]  David L. Robertson,et al.  Specificity in protein interactions and its relationship with sequence diversity and coevolution , 2007, Proceedings of the National Academy of Sciences.

[36]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[37]  Lesa J Beamer,et al.  Crystal structure of GDP-mannose dehydrogenase: a key enzyme of alginate biosynthesis in P. aeruginosa. , 2003, Biochemistry.

[38]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[39]  M. Ludwig,et al.  Crystal structure of the quorum-sensing protein LuxS reveals a catalytic metal site , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  David Haussler,et al.  Detecting the coevolution of biosequences--an example of RNA interaction prediction. , 2007, Molecular biology and evolution.

[41]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[42]  I. Lapidus,et al.  Secondary structure of 5 S ribosomal RNA. , 1970, Journal of theoretical biology.

[43]  D. Shi,et al.  Crystal structure of human ornithine transcarbamylase complexed with carbamoyl phosphate and L‐norvaline at 1.9 Å resolution , 2000, Proteins.

[44]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[45]  Robert Huber,et al.  The 2.0A resolution structure of the catalytic portion of a cyanobacterial membrane-bound manganese superoxide dismutase. , 2002, Journal of molecular biology.

[46]  Roger B. Sidje,et al.  Expokit: a software package for computing matrix exponentials , 1998, TOMS.

[47]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[48]  S E Ealick,et al.  Three-dimensional Structure of a Hyperthermophilic 5′-Deoxy-5′-methylthioadenosine Phosphorylase from Sulfolobus solfataricus* , 2001, The Journal of Biological Chemistry.

[49]  Zhengyuan O. Wang,et al.  Context dependence and coevolution among amino acid residues in proteins. , 2005, Methods in enzymology.

[50]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[51]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[52]  Robert Cowell,et al.  Introduction to Inference for Bayesian Networks , 1998, Learning in Graphical Models.

[53]  G. Borgstahl,et al.  Crystal structure of Y34F mutant human mitochondrial manganese superoxide dismutase and the functional role of tyrosine 34. , 1998, Biochemistry.

[54]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[55]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[56]  P. Lynne Howell,et al.  Structure determination of selenomethionyl S-adenosylhomocysteine hydrolase using data at a single wavelength , 1998, Nature Structural Biology.

[57]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[58]  Lesa J Beamer,et al.  Crystal structure of PMM/PGM: an enzyme in the biosynthetic pathway of P. aeruginosa virulence factors. , 2002, Structure.

[59]  M. James,et al.  The refined crystal structure of the 3C gene product from hepatitis A virus: specific proteinase activity and RNA recognition , 1997, Journal of virology.

[60]  V. Ramakrishnan,et al.  Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics , 2000, Nature.

[61]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[62]  Bernd Nidetzky,et al.  Crystal Structure of Pseudomonas fluorescens Mannitol 2-Dehydrogenase Binary and Ternary Complexes , 2002, The Journal of Biological Chemistry.

[63]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.