Learning Sequence Determinants of Protein: Protein Interaction Specificity with Sparse Graphical Models

In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (data-driven graphical models of specificity in protein:protein interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the ΔG of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted ΔG values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level ΔG predictions to be naturally understood in terms of residue-level constraints. Finally, the model DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.

[1]  Hiroshi Mamitsuka,et al.  Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools , 2011, Briefings Bioinform..

[2]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[3]  Lenore Cowen,et al.  Markov random fields reveal an N-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system , 2010, Proceedings of the National Academy of Sciences.

[4]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[5]  Jessica H. Fong,et al.  Predicting specificity in bZIP coiled-coil protein interactions , 2004, Genome Biology.

[6]  O. Lund,et al.  The Immune Epitope Database and Analysis Resource: From Vision to Blueprint , 2005, PLoS biology.

[7]  Chris Sander,et al.  A Specificity Map for the PDZ Domain Family , 2008, PLoS biology.

[8]  M. Teresa Pisabarro,et al.  Analysis of PDZ Domain-Ligand Interactions Using Carboxyl-terminal Phage Display* , 2000, The Journal of Biological Chemistry.

[9]  Chris Bailey-Kellogg,et al.  Graphical models of protein–protein interaction specificity from correlated mutations and interaction data , 2009, Proteins.

[10]  Hans D. Mittelmann,et al.  MultiRTA: A simple yet reliable method for predicting peptide binding affinities for multiple class II MHC allotypes , 2010, BMC Bioinformatics.

[11]  Gary D. Bader,et al.  A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain–peptide interaction from primary sequence , 2010, Bioinform..

[12]  Christopher James Langmead,et al.  Learning generative models of molecular dynamics , 2012, BMC Genomics.

[13]  Gevorg Grigoryan,et al.  Design of protein-interaction specificity affords selective bZIP-binding peptides , 2009, Nature.

[14]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[15]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[16]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[17]  Eric P. Xing,et al.  Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation , 2007, RECOMB.

[18]  M. Sheng,et al.  PDZ domains and the organization of supramolecular complexes. , 2001, Annual review of neuroscience.

[19]  Eric P. Xing,et al.  Approximating Correlated Equilibria using Relaxations on the Marginal Polytope , 2011, ICML.

[20]  Alexei Kurakin,et al.  The PDZ Domain as a Complex Adaptive System , 2007, PloS one.

[21]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[22]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[23]  Jaime G. Carbonell,et al.  Conditional Graphical Models for Protein Structural Motif Recognition , 2009, J. Comput. Biol..

[24]  Tanja Kortemme,et al.  Structure-based prediction of the peptide sequence space recognized by natural and synthetic PDZ domains. , 2010, Journal of molecular biology.

[25]  Gavin MacBeath,et al.  Predicting PDZ domain–peptide interactions from primary sequences , 2008, Nature Biotechnology.

[26]  John Sidney,et al.  A Systematic Assessment of MHC Class II Peptide Binding Predictions and Evaluation of a Consensus Approach , 2008, PLoS Comput. Biol..

[27]  Bonnie Berger,et al.  A Parameterized Algorithm for Protein Structure Alignment , 2007, J. Comput. Biol..

[28]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[29]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[30]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[31]  O. Lund,et al.  NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence , 2007, PloS one.

[32]  Jiunn R Chen,et al.  PDZ Domain Binding Selectivity Is Optimized Across the Mouse Proteome , 2007, Science.

[33]  Chris Bailey-Kellogg,et al.  Analysis of sequence–reactivity space for protein–protein interactions , 2004, Proteins.

[34]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[35]  Kalyan C. Tirupula,et al.  A minimal ligand binding pocket within a network of correlated mutations identified by multiple sequence and structural analysis of G protein coupled receptors , 2012, BMC biophysics.

[36]  Chris Bailey-Kellogg,et al.  Protein Design by Sampling an Undirected Graphical Model of Residue Constraints , 2009, TCBB.

[37]  Chris Bailey-Kellogg,et al.  MODELING AND INFERENCE OF SEQUENCE-STRUCTURE SPECIFICITY , 2009 .

[38]  M. Helmer-Citterich,et al.  SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family. , 2000, Journal of molecular biology.

[39]  C. Langmead,et al.  Accounting for conformational entropy in predicting binding free energies of protein‐protein interactions , 2011, Proteins.

[40]  Nicole Caspers,et al.  A thermodynamic ligand binding study of the third PDZ domain (PDZ3) from the mammalian neuronal protein PSD-95. , 2007, Biochemistry.

[41]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[42]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  S. Anderson,et al.  Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. , 2001, Proceedings of the National Academy of Sciences of the United States of America.