Evaluating protein structures determined by structural genomics consortia

Structural genomics projects are providing large quantities of new 3D structural data for proteins. To monitor the quality of these data, we have developed the protein structure validation software suite (PSVS), for assessment of protein structures generated by NMR or X‐ray crystallographic methods. PSVS is broadly applicable for structure quality assessment in structural biology projects. The software integrates under a single interface analyses from several widely‐used structure quality evaluation tools, including PROCHECK (Laskowski et al., J Appl Crystallog 1993;26:283–291), MolProbity (Lovell et al., Proteins 2003;50:437–450), Verify3D (Luthy et al., Nature 1992;356:83–85), ProsaII (Sippl, Proteins 1993;17: 355–362), the PDB validation software, and various structure‐validation tools developed in our own laboratory. PSVS provides standard constraint analyses, statistics on goodness‐of‐fit between structures and experimental data, and knowledge‐based structure quality scores in standardized format suitable for database integration. The analysis provides both global and site‐specific measures of protein structure quality. Global quality measures are reported as Z scores, based on calibration with a set of high‐resolution X‐ray crystal structures. PSVS is particularly useful in assessing protein structures determined by NMR methods, but is also valuable for assessing X‐ray crystal structures or homology models. Using these tools, we assessed protein structures generated by the Northeast Structural Genomics Consortium and other international structural genomics projects, over a 5‐year period. Protein structures produced from structural genomics projects exhibit quality score distributions similar to those of structures produced in traditional structural biology projects during the same time period. However, while some NMR structures have structure quality scores similar to those seen in higher‐resolution X‐ray crystal structures, the majority of NMR structures have lower scores. Potential reasons for this “structure quality score gap” between NMR and X‐ray crystal structures are discussed. Proteins 2007. © 2006 Wiley‐Liss, Inc.

[1]  F E Cohen,et al.  Protein model structure evaluation using the solvation free energy of folding. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  F M Richards,et al.  Areas, volumes, packing and protein structure. , 1977, Annual review of biophysics and bioengineering.

[6]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[7]  Robert Powers,et al.  Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. , 2005, Journal of the American Chemical Society.

[8]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[9]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[10]  John D. Westbrook,et al.  TargetDB: a target registration database for structural genomics projects , 2004, Bioinform..

[11]  J. Rullmann,et al.  Quality assessment of NMR structures: a statistical survey. , 1998, Journal of molecular biology.

[12]  Axel T. Brunger,et al.  X-PLOR Version 3.1: A System for X-ray Crystallography and NMR , 1992 .

[13]  Robert Powers,et al.  A topology‐constrained distance network algorithm for protein structure determination from NOESY data , 2005, Proteins.

[14]  Miron Livny,et al.  RECOORD: A recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank , 2005, Proteins.

[15]  A T Brünger,et al.  Free R value: cross-validation in crystallography. , 1997, Methods in enzymology.

[16]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[17]  M. Karplus,et al.  Prediction of the folding of short polypeptide segments by uniform conformational sampling , 1987, Biopolymers.

[18]  W. Gronwald,et al.  RFAC, a program for automated NMR R-factor estimation , 2000, Journal of biomolecular NMR.

[19]  Ad Bax,et al.  An empirical backbone-backbone hydrogen-bonding potential in proteins and its applications to NMR structure refinement and validation. , 2004, Journal of the American Chemical Society.

[20]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[21]  Torsten Herrmann,et al.  Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. , 2002, Journal of molecular biology.

[22]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[23]  G. Montelione,et al.  A banner year for membranes , 1999, Nature Structural Biology.

[24]  Charles D Schwieters,et al.  Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. , 2004, Journal of the American Chemical Society.

[25]  H Oschkinat,et al.  Automated NOESY interpretation with ambiguous distance restraints: the refined NMR solution structure of the pleckstrin homology domain from beta-spectrin. , 1997, Journal of molecular biology.

[26]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[27]  A. Sali,et al.  Comparative protein structure modeling by iterative alignment, model building and model assessment. , 2003, Nucleic Acids Research.

[28]  Shuren Wang,et al.  A test of enhancing model accuracy in high-throughput crystallography , 2005, Journal of Structural and Functional Genomics.

[29]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[30]  T F Havel,et al.  The solution structure of eglin c based on measurements of many NOEs and coupling constants and its comparison with X‐ray structures , 1992, Protein science : a publication of the Protein Society.

[31]  S. Wodak,et al.  Deviations from standard atomic volumes as a quality measure for protein crystal structures. , 1996, Journal of molecular biology.

[32]  M Nilges,et al.  Calculation of protein structures with ambiguous distance restraints. Automated assignment of ambiguous NOE crosspeaks and disulphide connectivities. , 1995, Journal of molecular biology.

[33]  J. Thornton,et al.  Stereochemical quality of protein structure coordinates , 1992, Proteins.

[34]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[35]  K. Wüthrich,et al.  Torsion angle dynamics for NMR structure calculation with the new program DYANA. , 1997, Journal of molecular biology.

[36]  D. S. Garrett,et al.  R-factor, Free R, and Complete Cross-Validation for Dipolar Coupling Refinement of NMR Structures , 1999 .

[37]  Gaohua Liu,et al.  NMR data collection and analysis protocol for high-throughput protein structure determination. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  D. Richardson,et al.  Exploring steric constraints on protein mutations using MAGE/PROBE , 2000, Protein science : a publication of the Protein Society.

[39]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[40]  Jun Zhu,et al.  BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank , 2003, Journal of biomolecular NMR.

[41]  Gert Vriend,et al.  Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors , 2005, PLoS Comput. Biol..

[42]  G T Montelione,et al.  Homology modeling of an RNP domain from a human RNA‐binding protein: Homology‐constrained energy optimization provides a criterion for distinguishing potential sequence alignments , 1998, Proteins.

[43]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[44]  D C Richardson,et al.  The kinemage: A tool for scientific communication , 1992, Protein science : a publication of the Protein Society.

[45]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[46]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[47]  E. Baker,et al.  Hydrogen bonding in globular proteins. , 1984, Progress in biophysics and molecular biology.

[48]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[49]  Gaetano T Montelione,et al.  Assessing precision and accuracy of protein structures derived from NMR data , 2005, Proteins.

[50]  A. Brünger Free R value: a novel statistical quantity for assessing the accuracy of crystal structures , 1992, Nature.

[51]  Borries Demeler,et al.  Structure and dynamics of the homodimeric dynein light chain km23. , 2005, Journal of molecular biology.

[52]  Charles D Schwieters,et al.  The Xplor-NIH NMR molecular structure determination package. , 2003, Journal of magnetic resonance.

[53]  M. Levitt,et al.  Automatic identification of secondary structure in globular proteins. , 1977, Journal of molecular biology.