Hamming distance geometry of a protein conformational space: Application to the clustering of a 4‐ns molecular dynamics trajectory of the HIV‐1 integrase catalytic core

Protein structures can be encoded into binary sequences (Gabarro‐Arpa et al., Comput Chem 2000;24:693–698 ) these are used to define a Hamming distance in conformational space: the distance between two different molecular conformations is the number of different bits in their sequences. Each bit in the sequence arises from a partition of conformational space in two halves. Thus, the information encoded in the binary sequences is also used to characterize the regions of conformational space visited by the system. We apply this distance and their associated geometric structures to the clustering and analysis of conformations sampled during a 4‐ns molecular dynamics simulation of the HIV‐1 integrase catalytic core. The cluster analysis of the simulation shows a division of the trajectory into two segments of 2.6 and 1.4 ns length, which are qualitatively different: the data points to the fact that equilibration is only reached at the end of the first segment. The Hamming distance is compared also to the r.m.s. deviation measure. The analysis of the cases studied so far shows that under the same conditions the two measures behave quite differently, and that the Hamming distance appears to be more robust than the r.m.s. deviation. Proteins 2002;47:169–179. © 2002 Wiley‐Liss, Inc.

[1]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[2]  Joël Pothier,et al.  MORMIN: A quasi‐Newtonian energy minimizer fitting the nuclear overhauser data , 1993, J. Comput. Chem..

[3]  Mw Hirsch,et al.  Chaos In Dynamical Systems , 2016 .

[4]  Sebastian Doniach,et al.  Protein flexibility in solution and in crystals , 1999 .

[5]  A Kitao,et al.  Energy landscape of a native protein: Jumping‐among‐minima model , 1998, Proteins.

[6]  M. Karplus,et al.  The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics , 1997 .

[7]  N. Go,et al.  Structural basis of hierarchical multiple substates of a protein. V: Nonlocal deformations , 1989, Proteins.

[8]  Christos Levcopoulos,et al.  The First Subquadratic Algorithm for Complete Linkage Clustering , 1995, ISAAC.

[9]  G. Ciccotti,et al.  Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes , 1977 .

[10]  Marc Le Bret,et al.  Cadira: An Object-oriented Platform for Modelling Molecules and Analyzing Simulzations , 1997, Comput. Chem..

[11]  J Gabarro-Arpa,et al.  Hydration of the dTn.dAn x dTn parallel triple helix: a Fourier transform infrared and gravimetric study correlated with molecular dynamics simulations. , 1997, Nucleic acids research.

[12]  P. Wolynes,et al.  The energy landscapes and motions of proteins. , 1991, Science.

[13]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[14]  G. Ulrich Nienhaus,et al.  Multiplexed-Replica Exchange Molecular Dynamics with the UNRES Force-Field as an Effective Method for Exploring the Conformational Energy Landscape of Proteins. , 2006 .

[15]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[16]  V. Mikol,et al.  Crystal structures of the catalytic domain of HIV-1 integrase free and complexed with its metal cofactor: high level of similarity of the active site with other viral integrases. , 1998, Journal of molecular biology.

[17]  Jacques Gabarro-Arpa,et al.  Clustering of a Molecular Dynamics Trajectory with a Hamming Distance , 2000, Comput. Chem..

[18]  J. Onuchic,et al.  Theory of protein folding: the energy landscape perspective. , 1997, Annual review of physical chemistry.

[19]  F E Cohen,et al.  Protein conformational landscapes: Energy minimization and clustering of a long molecular dynamics trajectory , 1995, Proteins.

[20]  Peter S. Shenkin,et al.  Cluster analysis of molecular conformations , 1994, J. Comput. Chem..

[21]  Andrew E. Torda,et al.  Algorithms for clustering molecular dynamics configurations , 1994, J. Comput. Chem..

[22]  A Wlodawer,et al.  Comparison of two highly refined structures of bovine pancreatic trypsin inhibitor. , 1987, Journal of molecular biology.

[23]  N Go,et al.  Structural basis of hierarchical multiple substates of a protein. I: Introduction , 1989, Proteins.

[24]  C. Brooks,et al.  Statistical clustering techniques for the analysis of long molecular dynamics trajectories: analysis of 2.2-ns trajectories of YPGDV. , 1993, Biochemistry.

[25]  J. Åqvist,et al.  Ion-water interaction potentials derived from free energy perturbation simulations , 1990 .