A multi-scale coevolutionary approach to predict interactions between protein domains

Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions. Author summary Interactions between proteins and their domains are at the basis of almost all biological processes. To complement labor intensive and error-prone experimental approaches to the genome-scale characterization of such interactions, we propose a computational approach based upon rapidly growing protein-sequence databases. To maintain interaction in the course of evolution, proteins and their domains are required to coevolve: evolutionary changes in the interaction partners appear correlated across several scales, from correlated presence-absence patterns of proteins across species, up to correlations in the amino-acid usage. Our approach combines these different scales within a common mathematical-statistical inference framework, which is inspired by the so-called direct coupling analysis. It is able to predict currently unknown, but biologically sensible interaction, and to identify cases of convergent evolution leading to alternative solutions for a common biological task. Thereby our work illustrates the potential of global statistical inference for the genome-scale coevolutionary analysis of interacting proteins and protein domains.

[1]  Alfonso Valencia,et al.  Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone , 2016, Proceedings of the National Academy of Sciences.

[2]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[3]  Martin Weigt,et al.  Inter-residue, inter-protein and inter-family coevolution: bridging the scales. , 2018, Current opinion in structural biology.

[4]  Simona Cocco,et al.  Inverse statistical physics of protein sequences: a key issues review , 2017, Reports on progress in physics. Physical Society.

[5]  Carlo Baldassi,et al.  Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis , 2016, Proceedings of the National Academy of Sciences.

[6]  Simon Whelan,et al.  Covariation Is a Poor Measure of Molecular Coevolution , 2015, Molecular biology and evolution.

[7]  J. W. Chase,et al.  Subunit structure of Escherichia coli exonuclease VII. , 1982, The Journal of biological chemistry.

[8]  Andrea Pagnani,et al.  Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon , 2015, PloS one.

[9]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Chen-Hsiang Yeang,et al.  Identifying Coevolving Partners from Paralogous Gene Families , 2008, Evolutionary bioinformatics online.

[11]  K. Gerdes,et al.  HicA of Escherichia coli Defines a Novel Family of Translation-Independent mRNA Interferases in Bacteria and Archaea , 2008, Journal of bacteriology.

[12]  Martin Weigt,et al.  Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis , 2017, Proceedings of the National Academy of Sciences.

[13]  M. Sebastián,et al.  The alkaline phosphatase PhoX is more widely distributed in marine bacteria than the classical PhoA , 2009, The ISME Journal.

[14]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[15]  P. Bork,et al.  Predicting biological networks from genomic data , 2008, FEBS letters.

[16]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[17]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Pan-Jun Kim,et al.  Genetic Co-Occurrence Network across Sequenced Microbes , 2011, PLoS Comput. Biol..

[19]  G. Storz,et al.  The Escherichia coli MntR Miniregulon Includes Genes Encoding a Small Protein and an Efflux Pump Required for Manganese Homeostasis , 2011, Journal of bacteriology.

[20]  Tal Pupko,et al.  Inference of Gain and Loss Events from Phyletic Patterns Using Stochastic Mapping and Maximum Parsimony—A Simulation Study , 2011, Genome biology and evolution.

[21]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[22]  S. Lory,et al.  Posttranslational processing of type IV prepilin and homologs by PilD of Pseudomonas aeruginosa. , 1994, Methods in enzymology.

[23]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[24]  Lucy J. Colwell,et al.  Power law tails in phylogenetic systems , 2018, Proceedings of the National Academy of Sciences.

[25]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[26]  Patrick Forterre,et al.  An Alternative Flavin-Dependent Mechanism for Thymidylate Synthesis , 2002, Science.

[27]  B. Snel,et al.  Systematic discovery of analogous enzymes in thiamin biosynthesis , 2003, Nature Biotechnology.

[28]  Robert D. Finn,et al.  iPfam: a database of protein family and domain interactions found in the Protein Data Bank , 2013, Nucleic Acids Res..

[29]  J. Greie The KdpFABC complex from Escherichia coli: a chimeric K+ transporter merging ion pumps with ion channels. , 2011, European journal of cell biology.

[30]  Li Huang,et al.  Cloning and characterization of rat spermatid protein SSP411: a thioredoxin-like protein. , 2004, Journal of andrology.

[31]  D. Wemmer,et al.  Role of the σ54 Activator Interacting Domain in Bacterial Transcription Initiation. , 2016, Journal of molecular biology.

[32]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[33]  Julie M. Sahalie,et al.  An experimentally derived confidence score for binary protein-protein interactions , 2008, Nature Methods.

[34]  D. Sakai,et al.  The pilL and pilN genes of IncI1 plasmids R64 and ColIb-P9 encode outer membrane lipoproteins responsible for thin pilus biogenesis. , 2000, Plasmid.

[35]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[36]  B. Lunt,et al.  Dissecting the Specificity of Protein-Protein Interaction in Bacterial Two-Component Signaling: Orphans and Crosstalks , 2011, PloS one.

[37]  Terence Hwa,et al.  High-resolution protein complexes from integrating genomic information with molecular simulation , 2009, Proceedings of the National Academy of Sciences.

[38]  W. Hol,et al.  The X-ray structure of the type II secretion system complex formed by the N-terminal domain of EpsE and the cytoplasmic domain of EpsL of Vibrio cholerae. , 2005, Journal of molecular biology.

[39]  A. Colinet,et al.  Molecular Evolution of a Novel Family of Putative Calcium Transporters , 2014, PloS one.

[40]  Matthew Spencer,et al.  A phylogenetic mixture model for gene family loss in parasitic bacteria. , 2009, Molecular biology and evolution.

[41]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[42]  K. Okamura,et al.  Comparative genome analysis of the mouse imprinted gene impact and its nonimprinted human homolog IMPACT: toward the structural basis for species-specific imprinting. , 2000, Genome research.

[43]  Lucy J. Colwell,et al.  Inferring interaction partners from protein sequences , 2016, Proceedings of the National Academy of Sciences.

[44]  K. Herrmann The Shikimate Pathway as an Entry to Aromatic Secondary Metabolism , 1995, Plant physiology.

[45]  Olivier Rivoire Elements of coevolution in biological sequences. , 2013, Physical review letters.

[46]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[47]  Wilbert Bitter,et al.  Role for Escherichia coli YidD in Membrane Protein Insertion , 2011, Journal of bacteriology.

[48]  Simona Cocco,et al.  From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction , 2012, PLoS Comput. Biol..

[49]  E. Grzesiuk,et al.  Ada response – a strategy for repair of alkylated DNA in bacteria , 2014, FEMS microbiology letters.

[50]  B. Palsson,et al.  An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) , 2003, Genome Biology.

[51]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[52]  Tal Pupko,et al.  CoPAP: Coevolution of Presence–Absence Patterns , 2013, Nucleic Acids Res..

[53]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[54]  A. Valencia,et al.  High-confidence prediction of global interactomes based on genome-wide coevolutionary networks , 2008, Proceedings of the National Academy of Sciences.

[55]  Matteo Pellegrini,et al.  Using phylogenetic profiles to predict functional relationships. , 2012, Methods in molecular biology.

[56]  Enrique Merino,et al.  ProOpDB: Prokaryotic Operon DataBase , 2011, Nucleic Acids Res..

[57]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[58]  D. Frishman,et al.  A domain interaction map based on phylogenetic profiling. , 2004, Journal of molecular biology.

[59]  G. Verdine,et al.  Structural Basis for the Excision Repair of Alkylation-Damaged DNA , 1996, Cell.

[60]  Tal Pupko,et al.  Uncovering the co-evolutionary network among prokaryotic genes , 2012, Bioinform..