A large data set comparison of protein structures determined by crystallography and NMR: Statistical test for structural differences and the effect of crystal packing

The existence of a large number of proteins for which both nuclear magnetic resonance (NMR) and X‐ray crystallographic coordinates have been deposited into the Protein Data Bank (PDB) makes the statistical comparison of the corresponding crystal and NMR structural models over a large data set possible, and facilitates the study of the effect of the crystal environment and other factors on structure. We present an approach for detecting statistically significant structural differences between crystal and NMR structural models which is based on structural superposition and the analysis of the distributions of atomic positions relative to a mean structure. We apply this to a set of 148 protein structure pairs (crystal vs NMR), and analyze the results in terms of methodological and physical sources of structural difference. For every one of the 148 structure pairs, the backbone root‐mean‐square distance (RMSD) over core atoms of the crystal structure to the mean NMR structure is larger than the average RMSD of the members of the NMR ensemble to the mean, with 76% of the structure pairs having an RMSD of the crystal structure to the mean more than a factor of two larger than the average RMSD of the NMR ensemble. On average, the backbone RMSD over core atoms of crystal structure to the mean NMR is approximately 1 Å. If non‐core atoms are included, this increases to 1.4 Å due to the presence of variability in loops and similar regions of the protein. The observed structural differences are only weakly correlated with the age and quality of the structural model and differences in conditions under which the models were determined. We examine steric clashes when a putative crystalline lattice is constructed using a representative NMR structure, and find that repulsive crystal packing plays a minor role in the observed differences between crystal and NMR structures. The observed structural differences likely have a combination of physical and methodological causes. Stabilizing attractive interactions arising from intermolecular crystal contacts which shift the equilibrium of the crystal structure relative to the NMR structure is a likely physical source which can account for some of the observed differences. Methodological sources of apparent structural difference include insufficient sampling or other issues which could give rise to errors in the estimates of the precision and/or accuracy. Proteins 2007. © 2007 Wiley‐Liss, Inc.

[1]  M. Billeter,et al.  Comparison of protein structures determined by NMR in solution and by X-ray diffraction in single crystals , 1992, Quarterly Reviews of Biophysics.

[2]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[3]  Gaetano T Montelione,et al.  Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles , 2005, Proteins.

[4]  Georgia Hadjipavlou,et al.  Linkage between dynamics and catalysis in a thermophilic-mesophilic enzyme pair , 2004, Nature Structural &Molecular Biology.

[5]  Michael Nilges,et al.  Materials and Methods Som Text Figs. S1 to S6 References Movies S1 to S5 Inferential Structure Determination , 2022 .

[6]  G. Montelione,et al.  Simulated annealing with restrained molecular dynamics using CONGEN: Energy refinement of the NMR solution structures of epidermal and type‐α transforming growth factors , 1996, Protein science : a publication of the Protein Society.

[7]  Timothy F. Havel,et al.  An evaluation of the combined use of nuclear magnetic resonance and distance geometry for the determination of protein conformations in solution. , 1985, Journal of molecular biology.

[8]  Roberto Tejero,et al.  Simulated annealing with restrained molecular dynamics using a flexible restraint potential: Theory and evaluation with simulated NMR constraints , 1996, Protein science : a publication of the Protein Society.

[9]  G. Clore,et al.  How much backbone motion in ubiquitin is required to account for dipolar coupling data measured in multiple alignment media as assessed by independent cross-validation? , 2004, Journal of the American Chemical Society.

[10]  A. Brünger Free R value: a novel statistical quantity for assessing the accuracy of crystal structures , 1992, Nature.

[11]  A. Gronenborn,et al.  Assessing the quality of solution nuclear magnetic resonance structures by complete cross-validation. , 1993, Science.

[12]  Gert Vriend,et al.  The precision of NMR structure ensembles revisited , 2003, Journal of biomolecular NMR.

[13]  M. DePristo,et al.  Simultaneous determination of protein structure and dynamics , 2005, Nature.

[14]  Janet M. Thornton,et al.  Knowledge-based validation of protein structure coordinates derived by X-ray crystallography and NMR spectroscopy , 1994 .

[15]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[16]  O. Jardetzky,et al.  An assessment of the precision and accuracy of protein structures determined by NMR. Dependence on distance errors. , 1994, Journal of molecular biology.

[17]  A. Bax,et al.  Are proteins even floppier than we thought? , 1997, Nature Structural Biology.

[18]  T W Muir,et al.  Synthetic, structural and biological studies of the ubiquitin system: chemically synthesized and native ubiquitin fold into identical three-dimensional structures. , 1994, The Biochemical journal.

[19]  Timothy F. Havel,et al.  NMR structure determination in solution: a critique and comparison with X-ray crystallography. , 1992, Annual review of biophysics and biomolecular structure.

[20]  A. Godzik,et al.  Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets , 1995, Protein science : a publication of the Protein Society.

[21]  G. Clore,et al.  Concordance of residual dipolar couplings, backbone order parameters and crystallographic B-factors for a small alpha/beta protein: a unified picture of high probability, fast atomic motions in proteins. , 2006, Journal of molecular biology.

[22]  M. Blackledge Recent progress in the study of biomolecular structure and dynamics in solution from residual dipolar couplings , 2005 .

[23]  Lewis E. Kay,et al.  New Tools Provide New Insights in NMR Studies of Protein Dynamics , 2006, Science.

[24]  D. A. Bosco,et al.  Enzyme Dynamics During Catalysis , 2002, Science.

[25]  B. Matthews,et al.  Reversible lattice repacking illustrates the temperature dependence of macromolecular interactions. , 2001, Journal of molecular biology.

[26]  B. Matthews,et al.  Cryo-cooling in macromolecular crystallography: advantages, disadvantages and optimization , 2004, Quarterly Reviews of Biophysics.

[27]  L. Kay,et al.  Intrinsic dynamics of an enzyme underlies catalysis , 2005, Nature.

[28]  A. Kossiakoff,et al.  Structural effects induced by mutagenesis affected by crystal packing factors: The structure of a 30–51 disulfide mutant of basic pancreatic trypsin inhibitor , 1992, Proteins.

[29]  G. Montelione,et al.  A novel RNA-binding motif in influenza A virus non-structural protein 1 , 1997, Nature Structural Biology.

[30]  G. Marius Clore,et al.  Improving the Packing and Accuracy of NMR Structures with a Pseudopotential for the Radius of Gyration , 1999 .

[31]  M. DePristo,et al.  Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. , 2004, Structure.

[32]  Michael Nilges,et al.  Weighting of experimental evidence in macromolecular structure determination. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Jeanmarie Guenot,et al.  Variability of conformations at crystal contacts in BPTI represent true low‐energy structures: Correspondence among lattice packing and molecular dynamics structures , 1992, Proteins.

[34]  J. Prestegard,et al.  NMR evidence for slow collective motions in cyanometmyoglobin , 1997, Nature Structural Biology.

[35]  P. R. Fisk,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1971 .

[36]  A. Gronenborn,et al.  Structures of protein complexes by multidimensional heteronuclear magnetic resonance spectroscopy. , 1995, Critical reviews in biochemistry and molecular biology.

[37]  A. Brünger,et al.  Conformational variability of solution nuclear magnetic resonance structures. , 1995, Journal of molecular biology.

[38]  Robert Powers,et al.  Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. , 2005, Journal of the American Chemical Society.

[39]  C. Dobson,et al.  Mapping long-range interactions in alpha-synuclein using spin-label NMR and ensemble molecular dynamics simulations. , 2005, Journal of the American Chemical Society.

[40]  Helen M. Berman,et al.  Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein , 1997, Nature Structural Biology.

[41]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[42]  M Wilmanns,et al.  Molecular replacement with NMR models using distance-derived pseudo B factors. , 1996, Acta crystallographica. Section D, Biological crystallography.

[43]  M. Gerstein,et al.  Methods for displaying macromolecular structural uncertainty: application to the globins. , 1995, Journal of molecular graphics.

[44]  G. K. Bhattacharyya,et al.  Statistical Concepts And Methods , 1978 .

[45]  W. Gronwald,et al.  RFAC, a program for automated NMR R-factor estimation , 2000, Journal of biomolecular NMR.

[46]  Ad Bax,et al.  Validation of Protein Structure from Anisotropic Carbonyl Chemical Shifts in a Dilute Liquid Crystalline Phase , 1998 .

[47]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[48]  A T Brünger,et al.  Free R value: cross-validation in crystallography. , 1997, Methods in enzymology.

[49]  Ad Bax,et al.  Weak alignment NMR: a hawk-eyed view of biomolecular structure. , 2005, Current opinion in structural biology.

[50]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[51]  G M Clore,et al.  Exploring the limits of precision and accuracy of protein structures determined by nuclear magnetic resonance spectroscopy. , 1993, Journal of molecular biology.

[52]  Nico Tjandra,et al.  Residual dipolar couplings in NMR structure analysis. , 2004, Annual review of biophysics and biomolecular structure.

[53]  Gert Vriend,et al.  Quantitative evaluation of experimental NMR restraints. , 2003, Journal of the American Chemical Society.

[54]  R. Lipsitz,et al.  RESIDUAL DIPOLAR COUPLINGS IN NMR STRUCTURE ANALYSIS ⁄1 , 2004 .

[55]  Ian W. Davis,et al.  Structure Validation by C a Geometry : f , y and C b Deviation , 2002 .

[56]  Michael Nilges,et al.  Molecular dynamics and accuracy of NMR structures: Effects of error bounds and data removal , 1999, Proteins.

[57]  A. Gronenborn,et al.  Improving the quality of NMR and crystallographic protein structures by means of a conformational database potential derived from structure databases , 1996, Protein science : a publication of the Protein Society.

[58]  C. Brooks,et al.  Generation of native-like protein structures from limited NMR data, modern force fields and advanced conformational sampling , 2005, Journal of biomolecular NMR.

[59]  U Bastolla,et al.  How to guarantee optimal stability for most representative structures in the protein data bank , 2001, Proteins.

[60]  Michael Nilges,et al.  A simple method for delineating well‐defined and variable regions in protein structures determined from interproton distance data , 1987 .

[61]  Gaetano T Montelione,et al.  Assessing precision and accuracy of protein structures derived from NMR data , 2005, Proteins.

[62]  Gaetano T Montelione,et al.  Evaluating protein structures determined by structural genomics consortia , 2006, Proteins.

[63]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[64]  Jens Meiler,et al.  Model-free analysis of protein backbone motion from residual dipolar couplings. , 2002, Journal of the American Chemical Society.

[65]  Miron Livny,et al.  RECOORD: A recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank , 2005, Proteins.

[66]  M. Gerstein,et al.  Average core structures and variability measures for protein families: application to the immunoglobulins. , 1995, Journal of molecular biology.

[67]  Milton C. Chew Distributions in Statistics: Continuous Univariate Distributions-1 and 2 , 1971 .

[68]  M. DePristo,et al.  Is one solution good enough? , 2006, Nature Structural &Molecular Biology.

[69]  M. Y. Lobanov,et al.  Comparison of X‐ray and NMR structures: Is there a systematic difference in residue contacts between X‐ray‐ and NMR‐resolved protein structures? , 2005, Proteins.

[70]  A. W. Pryor,et al.  Thermal vibrations in crystallography , 1975 .

[71]  J. Prestegard,et al.  Residual dipolar couplings in structure determination of biomolecules. , 2004, Chemical reviews.

[72]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[73]  Ke Ruan,et al.  NMR residual dipolar couplings as probes of biomolecular dynamics. , 2006, Chemical reviews.

[74]  Mark A Depristo,et al.  Crystallographic refinement by knowledge-based exploration of complex energy landscapes. , 2005, Structure.

[75]  Axel T. Brunger,et al.  Thermal Motion and Conformational Disorder in Protein Crystal Structures: Comparison of Multi‐Conformer and Time‐Averaging Models , 1994 .