Constructing sequence‐dependent protein models using coevolutionary information

Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid sites within the multiple sequence alignment of a protein family. Here, we use the maximum entropy‐based approach called mean field Direct Coupling Analysis (mfDCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family. We use the inferred pairwise statistical couplings to generate the sequence‐dependent heterogeneous interaction energies of a structure‐based model (SBM) where only native contacts are considered. Considering the ribosomal S6 protein and its circular permutants as well as the SH3 protein, we demonstrate that these models quantitatively agree with experimental data on folding mechanisms. This work serves as a new framework for generating coevolutionary data‐enriched models that can potentially be used to engineer key functional motions and novel interactions in protein systems.

[1]  Nicholas P. Schafer,et al.  AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. , 2012, Journal of Physical Chemistry B.

[2]  José N. Onuchic,et al.  Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information , 2014, Proceedings of the National Academy of Sciences.

[3]  Cecilia Clementi,et al.  Optimal combination of theory and experiment for the characterization of the protein folding landscape of S6: how far can a minimalist model go? , 2004, Journal of molecular biology.

[4]  Michael J. Berry,et al.  Weak pairwise correlations imply strongly correlated network states in a neural population , 2005, Nature.

[5]  Erel Levine,et al.  Inverse Ising inference with correlated samples , 2014, 1410.8703.

[6]  Tirion,et al.  Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis. , 1996, Physical review letters.

[7]  Cecilia Clementi,et al.  The effects of nonnative interactions on protein folding rates: Theory and simulation , 2004, Protein science : a publication of the Protein Society.

[8]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[9]  Jeffrey K. Noel,et al.  The Dominant Folding Route Minimizes Backbone Distortion in SH3 , 2012, PLoS Comput. Biol..

[10]  B. Lunt,et al.  Dissecting the Specificity of Protein-Protein Interaction in Bacterial Two-Component Signaling: Orphans and Crosstalks , 2011, PloS one.

[11]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[12]  J. Onuchic,et al.  Biomolecular dynamics: order–disorder transitions and energy landscapes , 2012, Reports on progress in physics. Physical Society.

[13]  Peter G Wolynes,et al.  Predictive energy landscapes for folding α-helical transmembrane proteins , 2014, Proceedings of the National Academy of Sciences.

[14]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[15]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  D. Otzen,et al.  Conformational plasticity in folding of the split beta-alpha-beta protein S6: evidence for burst-phase disruption of the native state. , 2002, Journal of molecular biology.

[17]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[18]  José N. Onuchic,et al.  Structural and energetic heterogeneity in protein folding. I. Theory , 2002 .

[19]  J. Onuchic,et al.  Constructing a folding model for protein S6 guided by native fluctuations deduced from NMR structures. , 2015, The Journal of chemical physics.

[20]  Andrew L. Ferguson,et al.  Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design. , 2013, Immunity.

[21]  H. Chan,et al.  Theoretical perspectives on nonnative interactions and intrinsic disorder in protein folding and binding. , 2015, Current opinion in structural biology.

[22]  D Baker,et al.  Long-range order in the src SH3 folding transition state. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Michael J. Eck,et al.  Three-dimensional structure of the tyrosine kinase c-Src , 1997, Nature.

[24]  Peter G Wolynes,et al.  Quantitative criteria for native energetic heterogeneity influences in the prediction of protein folding kinetics , 2009, Proceedings of the National Academy of Sciences.

[25]  Hue Sun Chan,et al.  Competition between native topology and nonnative interactions in simple and complex folding kinetics of natural and designed proteins , 2010, Proceedings of the National Academy of Sciences.

[26]  Ronald M. Levy,et al.  Pairwise and higher-order correlations among drug-resistance mutations in HIV-1 subtype B protease , 2009, BMC Bioinformatics.

[27]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[28]  J. Onuchic,et al.  Topological and energetic factors: what determines the structural details of the transition state ensemble and "en-route" intermediates for protein folding? An investigation for small globular proteins. , 2000, Journal of molecular biology.

[29]  S. Kirkpatrick,et al.  Solvable Model of a Spin-Glass , 1975 .

[30]  Terence Hwa,et al.  High-resolution protein complexes from integrating genomic information with molecular simulation , 2009, Proceedings of the National Academy of Sciences.

[31]  J. Onuchic,et al.  An all‐atom structure‐based potential for proteins: Bridging minimal models with all‐atom empirical forcefields , 2009, Proteins.

[32]  G Tiana,et al.  A many-body term improves the accuracy of effective potentials based on protein coevolutionary data. , 2015, The Journal of chemical physics.

[33]  Peter G. Wolynes,et al.  Role of explicitly cooperative interactions in protein folding funnels: A simulation study , 2001 .

[34]  Tao Chen,et al.  Native Contact Density and Nonnative Hydrophobic Effects in the Folding of Bacterial Immunity Proteins , 2015, PLoS Comput. Biol..

[35]  Jeanette Tångrot,et al.  Complete change of the protein folding transition state upon circular permutation , 2002, Nature Structural Biology.

[36]  Peter G Wolynes,et al.  Evolution, energy landscapes and the paradoxes of protein folding. , 2015, Biochimie.

[37]  W. Bialek,et al.  Statistical mechanics for natural flocks of birds , 2011, Proceedings of the National Academy of Sciences.

[38]  Luis Serrano,et al.  The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved , 1999, Nature Structural Biology.

[39]  Eugene I Shakhnovich,et al.  Identification of the minimal protein-folding nucleus through loop-entropy perturbations. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Ellinor Haglund,et al.  Changes of Protein Folding Pathways by Circular Permutation , 2008, Journal of Biological Chemistry.

[41]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[42]  N. Go Protein folding as a stochastic process , 1983 .

[43]  Terence Hwa,et al.  Coevolutionary signals across protein lineages help capture multiple protein conformations , 2013, Proceedings of the National Academy of Sciences.

[44]  H. Chan,et al.  Theoretical and experimental demonstration of the importance of specific nonnative interactions in protein folding , 2008, Proceedings of the National Academy of Sciences.

[45]  H. Chan,et al.  Biophysics of protein evolution and evolutionary protein biophysics , 2014, Journal of The Royal Society Interface.

[46]  A. Fersht,et al.  The folding of an enzyme. I. Theory of protein engineering analysis of stability and pathway of protein folding. , 1992, Journal of molecular biology.

[47]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[48]  R. Zwanzig High‐Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases , 1954 .

[49]  Faruck Morcos,et al.  From structure to function: the convergence of structure based models and co-evolutionary information. , 2014, Physical chemistry chemical physics : PCCP.

[50]  A. Fersht,et al.  Mapping the transition state and pathway of protein folding by protein engineering , 1989, Nature.

[51]  J. Onuchic,et al.  Robustness and generalization of structure‐based models for protein folding and function , 2009, Proteins.

[52]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[53]  R. Palmer,et al.  Solution of 'Solvable model of a spin glass' , 1977 .

[54]  Peter G Wolynes,et al.  Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection , 2014, Proceedings of the National Academy of Sciences.

[55]  J. Onuchic,et al.  Theory of protein folding: the energy landscape perspective. , 1997, Annual review of physical chemistry.

[56]  Nicholas P. Schafer,et al.  Predictive energy landscapes for protein–protein association , 2012, Proceedings of the National Academy of Sciences.

[57]  P G Wolynes,et al.  Learning To Fold Proteins Using Energy Landscape Theory. , 2013, Israel journal of chemistry.

[58]  S. Takada,et al.  Frustration, specific sequence dependence, and nonlinearity in large-amplitude fluctuations of allosteric proteins , 2011, Proceedings of the National Academy of Sciences.

[59]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[60]  J. Onuchic,et al.  Theory of Protein Folding This Review Comes from a Themed Issue on Folding and Binding Edited Basic Concepts Perfect Funnel Landscapes and Common Features of Folding Mechanisms , 2022 .

[61]  David Baker,et al.  Experiment and theory highlight role of native state topology in SH3 folding , 1999, Nature Structural Biology.

[62]  José N Onuchic,et al.  Gatekeepers in the ribosomal protein s6: thermodynamics, kinetics, and folding pathways revealed by a minimalist protein model. , 2004, Journal of molecular biology.

[63]  R. Swendsen,et al.  THE weighted histogram analysis method for free‐energy calculations on biomolecules. I. The method , 1992 .

[64]  Guido Tiana,et al.  The network of stabilizing contacts in proteins studied by coevolutionary data. , 2013, The Journal of chemical physics.

[65]  Ronald M. Levy,et al.  Correlated Electrostatic Mutations Provide a Reservoir of Stability in HIV Protease , 2012, PLoS Comput. Biol..

[66]  Jeffrey K. Noel,et al.  SMOG@ctbp: simplified deployment of structure-based models in GROMACS , 2010, Nucleic Acids Res..

[67]  P. Wolynes,et al.  Spin glasses and the statistical mechanics of protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[68]  J. Onuchic,et al.  Toward an outline of the topography of a realistic protein-folding funnel. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[69]  E. Shakhnovich,et al.  Simulation, experiment, and evolution: understanding nucleation in protein S6 folding. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Daniel E. Otzen,et al.  Conformational plasticity in folding of the split β-α-β protein S6: evidence for burst-phase disruption of the native state , 2002 .

[71]  Martin Weigt,et al.  Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis , 2012, Proceedings of the National Academy of Sciences.

[72]  Jie Chen,et al.  Transition states for folding of circular‐permuted proteins , 2004, Proteins.

[73]  A. Liljas,et al.  Crystal structure of the ribosomal protein S6 from Thermus thermophilus. , 1994, The EMBO journal.

[74]  J. Onuchic,et al.  Protein folding mediated by solvation: Water expulsion and formation of the hydrophobic core occur after the structural collapse , 2002, Proceedings of the National Academy of Sciences of the United States of America.