Graphical models of protein–protein interaction specificity from correlated mutations and interaction data

Protein–protein interactions are mediated by complementary amino acids defining complementary surfaces. Typically not all members of a family of related proteins interact equally well with all members of a partner family; thus analysis of the sequence record can reveal the complementary amino acid partners that confer interaction specificity. This article develops methods for learning and using probabilistic graphical models of such residue “cross‐coupling” constraints between interacting protein families, based on multiple sequence alignments and information about which pairs of proteins are known to interact. Our models generalize traditional consensus sequence binding motifs, and provide a probabilistic semantics enabling sound evaluation of the plausibility of new possible interactions. Furthermore, predictions made by the models can be explained in terms of the underlying residue interactions. Our approach supports different levels of prior knowledge regarding interactions, including both one‐to‐one (e.g., pairs of proteins from the same organism) and many‐to‐many (e.g., experimentally identified interactions), and we present a technique to account for possible bias in the represented interactions. We apply our approach in studies of PDZ domains and their ligands, fundamental building blocks in a number of protein assemblies. Our algorithms are able to identify biologically interesting cross‐coupling constraints, to successfully identify known interactions, and to make explainable predictions about novel interactions. Proteins 2009. © 2009 Wiley‐Liss, Inc.

[1]  Brian T. Sutch,et al.  Predicting protein functional sites with phylogenetic motifs , 2004, Proteins.

[2]  Bruce Randall Donald,et al.  A novel ensemble-based scoring and search algorithm for protein redesign, and its application to modify the substrate specificity of the gramicidin synthetase A phenylalanine adenylation enzyme , 2004, RECOMB.

[3]  D. Koller,et al.  InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale , 2007, Genome Biology.

[4]  W. Lim,et al.  Mechanism and role of PDZ domains in signaling complex assembly. , 2001, Journal of cell science.

[5]  M. Sheng,et al.  PDZ Domains: Structural Modules for Protein Complex Assembly* , 2002, The Journal of Biological Chemistry.

[6]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[7]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[8]  M. Helmer-Citterich,et al.  SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family. , 2000, Journal of molecular biology.

[9]  F. Arnold,et al.  Protein stability promotes evolvability. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[11]  Xiaomei Wu,et al.  Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset , 2008, Nucleic acids research.

[12]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[13]  B. Brannetti,et al.  Distinct Binding Specificity of the Multiple PDZ Domains of INADL, a Human Protein with Homology to INAD fromDrosophila melanogaster * , 2001, The Journal of Biological Chemistry.

[14]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[15]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[16]  Chris Bailey-Kellogg,et al.  Hypergraph Model of Multi-Residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination , 2007, J. Comput. Biol..

[17]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[18]  D. Baker,et al.  Computational design of a new hydrogen bond network and at least a 300-fold specificity switch at a protein-protein interface. , 2006, Journal of molecular biology.

[19]  Chris Sander,et al.  A Specificity Map for the PDZ Domain Family , 2008, PLoS biology.

[20]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[21]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  J. R. Green,et al.  Global investigation of protein–protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences , 2008, Nucleic acids research.

[23]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Baker,et al.  Computational redesign of protein-protein interaction specificity , 2004, Nature Structural &Molecular Biology.

[25]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[26]  H. Wolfson,et al.  Correlated mutations: Advances and limitations. A study on fusion proteins and on the Cohesin‐Dockerin families , 2006, Proteins.

[27]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[28]  Chris Bailey-Kellogg,et al.  Analysis of sequence–reactivity space for protein–protein interactions , 2004, Proteins.

[29]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[30]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[31]  Raphaël Guerois,et al.  Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking , 2008, Proceedings of the National Academy of Sciences.

[32]  L. Cantley,et al.  Recognition of Unique Carboxyl-Terminal Motifs by Distinct PDZ Domains , 1997, Science.

[33]  John H. Lewis,et al.  Crystal Structures of a Complexed and Peptide-Free Membrane Protein–Binding Domain: Molecular Basis of Peptide Recognition by PDZ , 1996, Cell.

[34]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[35]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[36]  V. Rybin,et al.  Computer-aided design of a PDZ domain to recognize new target sequences , 2002, Nature Structural Biology.

[37]  Bruce Randall Donald,et al.  A Novel Ensemble-Based Scoring and Search Algorithm for Protein Redesign and Its Application to Modify the Substrate Specificity of the Gramicidin Synthetase A Phenylalanine Adenylation Enzyme , 2005, J. Comput. Biol..

[38]  Michael I. Jordan Graphical Models , 2003 .

[39]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[40]  Chris Bailey-Kellogg,et al.  Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination , 2006, RECOMB.

[41]  Manuela Helmer-Citterich,et al.  iSPOT: a web tool to infer the interaction specificity of families of protein modules , 2003, Nucleic Acids Res..

[42]  Desiree Tillo,et al.  Codep: Maximizing co‐evolutionary interdependencies to discover interacting proteins , 2006, Proteins.

[43]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[44]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[45]  Julia M. Shifman,et al.  Exploring the origins of binding specificity through the computational redesign of calmodulin , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Lucy Skrabanek,et al.  PDZBase: a protein?Cprotein interaction database for PDZ-domains , 2005, Bioinform..

[47]  R. Couñago,et al.  In vivo molecular evolution reveals biophysical origins of organismal fitness. , 2006, Molecular cell.