Coevolutionary signals across protein lineages help capture multiple protein conformations

Significance We show that directly coupled evolutionary residue pairs provide a distinct footprint of conformational diversity in protein families. This is revealed as competing residue contacts unique to distinct configurations of proteins with multiple conformations. We demonstrate that combining this information with physical models of proteins is sufficient to uncover conformational diversity for several protein families. We discovered that such directly coupled residues not only allow us to accurately transition between apo and holo conformations but also help to uncover intermediate states that are not easily accessible to experimental or other computational methods. This enhanced sampling of the functional conformational space of proteins may have broad implications in protein structure determination and the design of ligands that can trap intermediate states. A long-standing problem in molecular biology is the determination of a complete functional conformational landscape of proteins. This includes not only proteins’ native structures, but also all their respective functional states, including functionally important intermediates. Here, we reveal a signature of functionally important states in several protein families, using direct coupling analysis, which detects residue pair coevolution of protein sequence composition. This signature is exploited in a protein structure-based model to uncover conformational diversity, including hidden functional configurations. We uncovered, with high resolution (mean ∼1.9 Å rmsd for nonapo structures), different functional structural states for medium to large proteins (200–450 aa) belonging to several distinct families. The combination of direct coupling analysis and the structure-based model also predicts several intermediates or hidden states that are of functional importance. This enhanced sampling is broadly applicable and has direct implications in protein structure determination and the design of ligands or drugs to trap intermediate states.

[1]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[2]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[3]  Ronald M. Levy,et al.  Correlated Electrostatic Mutations Provide a Reservoir of Stability in HIV Protease , 2012, PLoS Comput. Biol..

[4]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[5]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[6]  Martin Weigt,et al.  Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis , 2012, Proceedings of the National Academy of Sciences.

[7]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[8]  Nicholas P. Schafer,et al.  AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. , 2012, Journal of Physical Chemistry B.

[9]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[10]  David T. Jones,et al.  Protein topology from predicted residue contacts , 2012, Protein science : a publication of the Protein Society.

[11]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[12]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[13]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[14]  William R. Taylor,et al.  Structural Constraints on the Covariance Matrix Derived from Multiple Aligned Protein Sequences , 2011, PloS one.

[15]  J. Mccammon,et al.  Induced Fit or Conformational Selection? The Role of the Semi-closed State in the Maltose Binding Protein , 2011, Biochemistry.

[16]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[17]  James Andrew McCammon,et al.  Accessing a Hidden Conformation of the Maltose Binding Protein Using Accelerated Molecular Dynamics , 2011, PLoS Comput. Biol..

[18]  Peter G Wolynes,et al.  Protein structure prediction: do hydrogen bonding and water-mediated interactions suffice? , 2010, Methods.

[19]  Jeffrey K. Noel,et al.  SMOG@ctbp: simplified deployment of structure-based models in GROMACS , 2010, Nucleic Acids Res..

[20]  Bert L. de Groot,et al.  Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations , 2010, PLoS Comput. Biol..

[21]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[22]  Terence Hwa,et al.  Computational modeling of phosphotransfer complexes in two-component signaling. , 2010, Methods in enzymology.

[23]  Najeeb M. Halabi,et al.  Protein Sectors: Evolutionary Units of Three-Dimensional Structure , 2009, Cell.

[24]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[25]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[26]  J. Skolnick,et al.  What is the relationship between the global structures of apo and holo proteins? , 2007, Proteins.

[27]  Troy C Messina,et al.  Protein free energy landscapes remodeled by ligand binding. , 2007, Biophysical journal.

[28]  J. Onuchic,et al.  Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations , 2006, Proceedings of the National Academy of Sciences.

[29]  Emilio Gallicchio,et al.  Conformational equilibria and free energy profiles for the allosteric transition of the ribose-binding protein. , 2005, Journal of molecular biology.

[30]  Wei Wang,et al.  Progress in the development and application of computational methods for probabilistic protein design , 2005, Comput. Chem. Eng..

[31]  U. Magnusson,et al.  X-ray Structures of the Leucine-binding Protein Illustrate Conformational Changes and the Basis of Ligand Specificity* , 2004, Journal of Biological Chemistry.

[32]  G. Chirikjian,et al.  Elastic models of conformational transitions in macromolecules. , 2002, Journal of molecular graphics & modelling.

[33]  E. Gouaux,et al.  Mechanisms for Activation and Antagonism of an AMPA-Sensitive Glutamate Receptor Crystal Structures of the GluR2 Ligand Binding Core , 2000, Neuron.

[34]  J. Onuchic,et al.  Topological and energetic factors: what determines the structural details of the transition state ensemble and "en-route" intermediates for protein folding? An investigation for small globular proteins. , 2000, Journal of molecular biology.

[35]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[36]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[37]  X. Daura,et al.  Peptide Folding: When Simulation Meets Experiment , 1999 .

[38]  Y. J. Sun,et al.  The crystal structure of glutamine-binding protein from Escherichia coli. , 1996, Journal of molecular biology.

[39]  J. Falke,et al.  Large amplitude twisting motions of an interdomain hinge: a disulfide trapping study of the galactose-glucose binding protein. , 1995, Biochemistry.

[40]  Crystals of glutamine-binding protein in various conformational states. , 1994, Journal of molecular biology.

[41]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[42]  C. Sander,et al.  Correlated Mutations and Residue Contacts , 1994 .

[43]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.