Validation of nuclear magnetic resonance structures of proteins and nucleic acids: Hydrogen geometry and nomenclature

A statistical analysis is reported of 1,200 of the 1,404 nuclear magnetic resonance (NMR)‐derived protein and nucleic acid structures deposited in the Protein Data Bank (PDB) before 1999. Excluded from this analysis were the entries not yet fully validated by the PDB and the more than 100 entries that contained < 95% of the expected hydrogens. The aim was to assess the geometry of the hydrogens in the remaining structures and to provide a check on their nomenclature. Deviations in bond lengths, bond angles, improper dihedral angles, and planarity with respect to estimated values were checked. More than 100 entries showed anomalous protonation states for some of their amino acids. Approximately 250,000 (1.7%) atom names differed from the consensus PDB nomenclature. Most of the inconsistencies are due to swapped prochiral labeling. Large deviations from the expected geometry exist for a considerable number of entries, many of which are average structures. The most common causes for these deviations seem to be poor minimization of average structures and an improper balance between force‐field constraints for experimental and holonomic data. Some specific geometric outliers are related to the refinement programs used. A number of recommendations for biomolecular databases, modeling programs, and authors submitting biomolecular structures are given. Proteins 1999;37:404–416. ©1999 Wiley‐Liss, Inc.

[1]  W H De Camp,et al.  Specification of molecular chirality. , 1989, Chirality.

[2]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[3]  C. Sander,et al.  The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value , 1996, Comput. Appl. Biosci..

[4]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[5]  J. Thornton,et al.  Stereochemical quality of protein structure coordinates , 1992, Proteins.

[6]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[7]  Janet M. Thornton,et al.  Knowledge-based validation of protein structure coordinates derived by X-ray crystallography and NMR spectroscopy , 1994 .

[8]  IUPAC‐IUB Joint Commission on Biochemical Nomenclature Abbreviations and Symbols for the Description of Conformations of Polynucleotide Chains , 2000, Current protocols in nucleic acid chemistry.

[9]  Günter Helmchen,et al.  Basic Principles of the CIP-System and Proposals for a Revision† , 1982 .

[10]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[11]  Peter Schultze,et al.  Chirality errors in nucleic acid structures , 1997, Nature.

[12]  Chris Sander,et al.  Who checks the checkers? Four validation tools applied to eight atomic resolution structures. EU 3-D Validation Network. , 1998, Journal of molecular biology.

[13]  K. Wüthrich,et al.  Recommendations for the presentation of NMR structures of proteins and nucleic acids – IUPAC-IUBMB-IUPAB Inter-Union Task Group on the Standardization of Data Bases of Protein and Nucleic Acid Structures Determined by NMR Spectroscopy , 1998, European journal of biochemistry.

[14]  I. C. O. B. Nomenclature IUPAC-IUB Commission on Biochemical Nomenclature. Abbreviations and symbols for the description of the conformation of polypeptide chains. Tentative rules (1969). , 1970, Biochemistry.

[15]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[16]  K Wüthrich,et al.  Efficient computation of three-dimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. , 1991, Journal of molecular biology.

[17]  Portland Press Ltd IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN). Nomenclature and symbolism for amino acids and peptides. Recommendations 1983 , 1984 .

[18]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[19]  P E Wright,et al.  Recommendations for the presentation of NMR structures of proteins and nucleic acids. , 1998, Journal of molecular biology.

[20]  C. Sander,et al.  Verification of protein structures : Side-chain planarity , 1996 .

[21]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[22]  Axel T. Brunger,et al.  X-PLOR Version 3.1: A System for X-ray Crystallography and NMR , 1992 .

[23]  Timothy F. Havel An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. , 1991, Progress in biophysics and molecular biology.

[24]  R. Kaptein,et al.  Solution structure of the HU protein from Bacillus stearothermophilus. , 1995, Journal of molecular biology.

[25]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[26]  Anthony E. Petrarca,et al.  Unambiguous specification of stereoisomerism about a double bond , 1968 .

[27]  J. Rullmann,et al.  Quality assessment of NMR structures: a statistical survey. , 1998, Journal of molecular biology.