Identification of direct residue contacts in protein–protein interaction by message passing

Understanding the molecular determinants of specificity in protein–protein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein–protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.

[1]  A. Ninfa,et al.  Covalent modification of the glnG product, NRI, by the glnL product, NRII, regulates the transcription of the glnALG operon in Escherichia coli. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[3]  Norman Margolus,et al.  Physics and Computation , 1987 .

[4]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[5]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[6]  V. Weiss,et al.  A common switch in activation of the response regulators NtrC and PhoB: phosphorylation induces dimerization of the receiver modules. , 1995, The EMBO journal.

[7]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[8]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[9]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[10]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[11]  J. Hoch,et al.  A transient interaction between two phosphorelay proteins trapped in a crystal lattice reveals the mechanism of molecular recognition and phosphotransfer in signal transduction. , 2000, Structure.

[12]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[13]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[14]  M. Mézard,et al.  Analytic and Algorithmic Solution of Random Satisfiability Problems , 2002, Science.

[15]  A. Horovitz,et al.  Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations , 2002, Proteins.

[16]  Gürol M. Süel,et al.  Evolutionarily conserved networks of residues mediate allosteric communication in proteins , 2003, Nature Structural Biology.

[17]  N. Isaacs,et al.  Crystal Structure of the Response Regulator 02 Receiver Domain, the Essential YycF Two-Component System of Streptococcus pneumoniae in both Complexed and Native States , 2004, Journal of bacteriology.

[18]  Tanja Kortemme,et al.  Computational design of protein-protein interactions. , 2004, Current opinion in chemical biology.

[19]  K. Varughese,et al.  Metals in the sporulation phosphorelay: manganese binding by the response regulator Spo0F. , 2004, Acta crystallographica. Section D, Biological crystallography.

[20]  D. Baker,et al.  Computational redesign of protein-protein interaction specificity , 2004, Nature Structural &Molecular Biology.

[21]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[22]  M. Vidal,et al.  Interactome: gateway into systems biology. , 2005, Human molecular genetics.

[23]  Wayne A Hendrickson,et al.  Structure of the entire cytoplasmic portion of a sensor histidine‐kinase protein , 2005, The EMBO journal.

[24]  Ann M. Stock,et al.  Structural Analysis and Solution Studies of the Activated Regulatory Domain of the Response Regulator ArcA: A Symmetric Dimer Mediated by the α4-β5-α5 Face , 2005 .

[25]  T. Mascher,et al.  Stimulus Perception in Bacterial Signal-Transducing Histidine Kinases , 2006, Microbiology and Molecular Biology Reviews.

[26]  Michael Y. Galperin Structural Classification of Bacterial Response Regulators: Diversity of Output Domains and Domain Combinations , 2006, Journal of bacteriology.

[27]  Igor B. Zhulin,et al.  MiST: a microbial signal transduction database , 2006, Nucleic Acids Res..

[28]  Ann M Stock,et al.  Crystal Structures of the Receiver Domain of the Response Regulator PhoP from Escherichia coli in the Absence and Presence of the Phosphoryl Analog Beryllofluoride , 2007, Journal of bacteriology.

[29]  Terence Hwa,et al.  Features of protein-protein interactions in two-component signaling deduced from genomic libraries. , 2007, Methods in enzymology.

[30]  Christopher L. McClendon,et al.  Reaching for high-hanging fruit in drug discovery at protein–protein interfaces , 2007, Nature.

[31]  M. Laub,et al.  Specificity in two-component signal transduction pathways. , 2007, Annual review of genetics.

[32]  W. Bialek,et al.  Rediscovering the power of pairwise interactions , 2007, 0712.4397.

[33]  Yoram Burak,et al.  The Origins of Specificity in Polyketide Synthase Protein Interactions , 2007, PLoS Comput. Biol..

[34]  C. Schmeisser,et al.  Metagenomics, biotechnology with non-culturable microbes , 2007, Applied Microbiology and Biotechnology.

[35]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[36]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[37]  T. Hwa,et al.  Co-evolving motions at protein-protein interfaces of two-component signaling systems identified by covariance analysis. , 2008, Biochemistry.

[38]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[39]  M. Mézard,et al.  Information, Physics, and Computation , 2009 .

[40]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[41]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..