Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins

Significance We study a set of proteins that exhibit low sequence identity, but high structural homology and functional similarity. It is demonstrated that a physics-based sequence comparison tool, the property factor method, is able to detect differences between the sequences of these proteins that correlate with differences in their structures and dynamics. It is shown that these sequence differences are not detected in this challenging system by conventional alignment methods. This result suggests that a significant amount of the information encoded in protein sequences is not captured by evolutionarily motivated comparison methods. We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences—the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction. Despite functional similarity and structural homology, they exhibit low sequence identity. PFM sequence comparison demonstrates a statistically significant qualitative difference between the sequence of CheY and those of the other two proteins that is not found using conventional alignment methods. This difference is shown to be consonant with structural characteristics, using distance matrix comparisons. We also demonstrate that residues participating strongly in native contacts during unfolding are distributed differently in CheY than in the other two proteins. The PFM result is also in accord with dynamic simulation results of several types. Molecular dynamics simulations of all three proteins were carried out at several temperatures, and it is shown that the dynamics of CheY are predicted to differ from those of NT-NtrC and Spo0F. The predicted dynamic properties of the three proteins are in good agreement with experimentally determined B factors and with fluctuations predicted by the Gaussian network model. We pinpoint the differences between the PFM and traditional sequence comparisons and discuss the informatic basis for the ability of the PFM approach to detect physical differences between these sequences that are not apparent from traditional alignment-based comparison.

[1]  Jing Huang,et al.  CHARMM36 all‐atom additive protein force field: Validation based on comparison to NMR data , 2013, J. Comput. Chem..

[2]  Stefan M. Larson,et al.  The family feud: do proteins with similar structures fold via the same pathway? , 2005, Current opinion in structural biology.

[3]  M. Edgell,et al.  Insights into correlated motions and long-range interactions in CheY derived from molecular dynamics simulations. , 2007, Biophysical journal.

[4]  J. Clarke,et al.  Different Members of a Simple Three-Helix Bundle Protein Family Have Very Different Folding Rate Constants and Fold by Different Mechanisms , 2009, Journal of molecular biology.

[5]  Harold A Scheraga,et al.  Nonexponential decay of internal rotational correlation functions of native proteins and self-similar structural fluctuations , 2010, Proceedings of the National Academy of Sciences.

[6]  Ronald D. Hills,et al.  Subdomain competition, cooperativity, and topological frustration in the folding of CheY. , 2008, Journal of molecular biology.

[7]  C. Brooks,et al.  Modulation of frustration in folding by sequence permutation , 2014, Proceedings of the National Academy of Sciences.

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Gerstein,et al.  LPFC: An internet library of protein family core structures , 1997, Protein science : a publication of the Protein Society.

[10]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[11]  S Rackovsky On the nature of the protein folding code. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[12]  C. Chennubhotla,et al.  The Gaussian Network Model , 2005 .

[13]  Hoover,et al.  Canonical dynamics: Equilibrium phase-space distributions. , 1985, Physical review. A, General physics.

[14]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[15]  R. Freter,et al.  The role of chemotaxis in the ecology of bacterial pathogens of mucosal surfaces , 1977, Nature.

[16]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[17]  Alexander D. MacKerell,et al.  Extending the treatment of backbone energetics in protein force fields: Limitations of gas‐phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations , 2004, J. Comput. Chem..

[18]  Akinori Kidera,et al.  Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids , 1985 .

[19]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[20]  Leo S. D. Caves,et al.  Bio3d: An R Package , 2022 .

[21]  I. Bahar,et al.  Normal mode analysis : theory and applications to biological and chemical systems , 2005 .

[22]  Alexander D. MacKerell,et al.  CHARMM-GUI Input Generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM Simulations Using the CHARMM36 Additive Force Field , 2015, Journal of chemical theory and computation.

[23]  D E Wemmer,et al.  Two-state allosteric behavior in a single-domain signaling protein. , 2001, Science.

[24]  Jane Clarke,et al.  The folding of spectrin domains I: wild-type domains have the same stability but very different kinetic properties. , 2004, Journal of molecular biology.

[25]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[26]  Harold A Scheraga,et al.  Anomalous diffusion and dynamical correlation between the side chains and the main chain of proteins in their native state , 2012, Proceedings of the National Academy of Sciences.

[27]  P. Wolynes,et al.  The energy landscapes and motions of proteins. , 1991, Science.

[28]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[29]  Mousumi Hazra,et al.  Comparative molecular dynamics simulation studies for determining factors contributing to the thermostability of chemotaxis protein “CheY” , 2014, Journal of biomolecular structure & dynamics.

[30]  Brian F. Volkman,et al.  Structure of a transiently phosphorylated switch in bacterial signal transduction , 2000, Nature.

[31]  Carsten Kutzner,et al.  Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS , 2015, EASC.

[32]  Chakra Chennubhotla,et al.  The Gaussian Network Model: Theory and Applications , 2005 .

[33]  E. Nogales,et al.  The structural basis for regulated assembly and function of the transcriptional activator NtrC. , 2006, Genes & development.

[34]  Alexander D. MacKerell,et al.  All-atom empirical potential for molecular modeling and dynamics studies of proteins. , 1998, The journal of physical chemistry. B.

[35]  P. Matsumura,et al.  Crystal structure of Escherichia coli CheY refined at 1.7-A resolution. , 1993, The Journal of biological chemistry.

[36]  J. Skolnick,et al.  Sequence evolution and the mechanism of protein folding. , 2000, Biophysical journal.

[37]  Alexander D. MacKerell,et al.  Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. , 2012, Journal of chemical theory and computation.

[38]  E. Shakhnovich,et al.  Conserved residues and the mechanism of protein folding , 1996, Nature.

[39]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[40]  Gerhard Hummer,et al.  Native contacts determine protein folding mechanisms in atomistic simulations , 2013, Proceedings of the National Academy of Sciences.

[41]  H. Jane Dyson,et al.  Conservation of folding pathways in evolutionarily distant globin sequences , 2000, Nature Structural Biology.

[42]  Molecular Dynamic Simulations of the N-Terminal Receiver Domain of NtrC Reveal Intrinsic Conformational Flexibility in the Inactive State , 2006, Journal of biomolecular structure & dynamics.

[43]  O. Ptitsyn,et al.  Non-functional conserved residues in globins and their possible role as a folding nucleus. , 1999, Journal of molecular biology.

[44]  A. Liwo,et al.  Local vs global motions in protein folding. , 2013, Journal of chemical theory and computation.

[45]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[46]  Ronald D. Hills,et al.  Topological frustration in beta alpha-repeat proteins: sequence diversity modulates the conserved folding mechanisms of alpha/beta/alpha sandwich proteins. , 2010, Journal of molecular biology.

[47]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[49]  New Insights into Protein (Un)Folding Dynamics. , 2015, The journal of physical chemistry letters.

[50]  M Gerstein,et al.  Analysis of protein loop closure. Two types of hinges produce one motion in lactate dehydrogenase. , 1991, Journal of molecular biology.

[51]  S. Rackovsky,et al.  Alternative approach to protein structure prediction based on sequential similarity of physical properties , 2015, Proceedings of the National Academy of Sciences.

[52]  Hassan A. Karimi,et al.  oGNM: online computation of structural dynamics using the Gaussian Network Model , 2006, Nucleic Acids Res..

[53]  L. Gierasch,et al.  Keeping it in the family: folding studies of related proteins. , 2001, Current opinion in structural biology.

[54]  John Karanicolas,et al.  The origins of asymmetry in the folding transition states of protein L and protein G , 2002, Protein science : a publication of the Protein Society.

[55]  L A Mirny,et al.  How evolution makes proteins fold quickly. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[56]  S. Rackovsky,et al.  Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences , 2014, Proceedings of the National Academy of Sciences.

[57]  C L Brooks,et al.  Taking a Walk on a Landscape , 2001, Science.

[58]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997, J. Comput. Chem..

[59]  H. Scheraga,et al.  Statistical analysis of the physical properties of the 20 naturally occurring amino acids , 1985 .

[60]  N. Xuong,et al.  Crystal structure of a phosphatase-resistant mutant of sporulation response regulator Spo0F from Bacillus subtilis. , 1996, Structure.

[61]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[62]  Kevin W Plaxco,et al.  Contact order revisited: Influence of protein size on the folding rate , 2003, Protein science : a publication of the Protein Society.

[63]  Taehoon Kim,et al.  CHARMM‐GUI: A web‐based graphical user interface for CHARMM , 2008, J. Comput. Chem..

[64]  Günther H. J. Peters The effect of Asp54 phosphorylation on the energetics and dynamics in the response regulator protein Spo0F studied by molecular dynamics , 2009, Proteins.

[65]  Sheena E Radford,et al.  Structural analysis of the rate-limiting transition states in the folding of Im7 and Im9: similarities and differences in the folding of homologous proteins. , 2003, Journal of molecular biology.

[66]  S. Nosé A unified formulation of the constant temperature molecular dynamics methods , 1984 .