MolProbity: More and better reference data for improved all‐atom structure validation

This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open‐source CCTBX portion of the Phenix software system. This improves long‐term maintainability and enhances the thorough integration of MolProbity‐style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open‐source code, reference datasets, and the resulting multi‐dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His “flip” correction is now more idealized, since the post‐refinement step has apparently often been skipped in the past. Two distinct sets of heavy‐atom‐to‐hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron‐cloud‐center positions suitable for X‐ray crystallography and one for nuclear positions. New validations include messages at input about problem‐causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality‐filtered residues in a new reference dataset, the CaBLAM Cα‐CO virtual‐angle analysis of backbone and secondary structure for cryoEM or low‐resolution X‐ray, and flagging of the very rare cis‐nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all‐atom clashscore.

[1]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[2]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[3]  David C. Richardson,et al.  Divalent cation tolerant protein CUTA from Homo sapiens O60888 , 2004 .

[4]  Zbigniew Dauter,et al.  The structures of Micrococcus lysodeikticus catalase, its ferryl intermediate (compound II) and NADPH complex. , 2002, Acta crystallographica. Section D, Biological crystallography.

[5]  M. Saraste,et al.  Structural comparisons of calponin homology domains: implications for actin binding. , 1998, Structure.

[6]  Fei Long,et al.  REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. , 2004, Acta crystallographica. Section D, Biological crystallography.

[7]  Helen M Berman,et al.  RNA backbone: consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution). , 2008, RNA.

[8]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[9]  Roland L. Dunbrack,et al.  Nonplanar peptide bonds in proteins are common and conserved but not biased toward active sites , 2011, Proceedings of the National Academy of Sciences.

[10]  David C. Richardson,et al.  MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and their complexes , 2004, Nucleic Acids Res..

[11]  Thomas L. Starr,et al.  Calculation of the crystal structures of hydrocarbons by molecular packing analysis , 1977, Comput. Chem..

[12]  Haruki Nakamura,et al.  Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. , 2017, Methods in molecular biology.

[13]  Ian W. Davis,et al.  Structure Validation by C a Geometry : f , y and C b Deviation , 2002 .

[14]  David C Richardson,et al.  Computational Methods for RNA Structure Validation and Improvement. , 2015, Methods in enzymology.

[15]  Dale E Tronrud,et al.  Using a conformation-dependent stereochemical library improves crystallographic refinement of proteins. , 2010, Acta crystallographica. Section D, Biological crystallography.

[16]  Tristan Ian Croll,et al.  The rate of cis-trans conformation errors is increasing in low-resolution crystal structures. , 2015, Acta crystallographica. Section D, Biological crystallography.

[17]  Jennifer A. Doudna,et al.  New tools provide a second look at HDV ribozyme structure, dynamics and cleavage , 2014, Nucleic acids research.

[18]  Robert Huber,et al.  Structure quality and target parameters , 2006 .

[19]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[20]  Vincent B. Chen,et al.  KING (Kinemage, Next Generation): A versatile interactive molecular and scientific visualization program , 2009, Protein science : a publication of the Protein Society.

[21]  Michael G Prisant,et al.  Crystallographic model validation: from diagnosis to healing. , 2013, Current opinion in structural biology.

[22]  T O Yeates,et al.  Detecting and overcoming crystal twinning. , 1997, Methods in enzymology.

[23]  Christopher Joseph Williams Using C-Alpha Geometry to Describe Protein Secondary Structure and Motifs , 2015 .

[24]  F. Allen The Cambridge Structural Database: a quarter of a million crystal structures and rising. , 2002, Acta crystallographica. Section B, Structural science.

[25]  Randy J. Read,et al.  Acta Crystallographica Section D Biological , 2003 .

[26]  Christopher J. Williams,et al.  Model validation: local diagnosis, correction and when to quit , 2018, Acta crystallographica. Section D, Structural biology.

[27]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[28]  Keisuke Saito,et al.  Proton transfer reactions and hydrogen-bond networks in protein environments , 2014, Journal of The Royal Society Interface.

[29]  Saulius Gražulis,et al.  Crystallography Open Database – an open-access collection of crystal structures , 2009, Journal of applied crystallography.

[30]  P. D. Adams,et al.  Xtriage and Fest : automatic assessment of X-ray data and substructure structure factor estimation , 2005 .

[31]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[32]  Nicholas K. Sauter,et al.  The Computational Crystallography Toolbox: crystallographic algorithms in a reusable software framework , 2002 .

[33]  Sameer Velankar,et al.  Implementing an X-ray validation pipeline for the Protein Data Bank , 2012, Acta crystallographica. Section D, Biological crystallography.

[34]  G. Sheldrick A short history of SHELX. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[35]  A. Bondi van der Waals Volumes and Radii , 1964 .

[36]  Roger A. Garrett Roles for ribosomal proteins , 1983 .

[37]  Thomas F. Koetzle,et al.  Structure of N-acetyl-l-cysteine: X-ray (T = 295 K) and neutron (T = 16 K) diffraction studies , 1981 .

[38]  F. H. Allen,et al.  A systematic pairwise comparison of geometric parameters obtained by X-ray and neutron diffraction , 1986 .

[39]  Pierre Bianco,et al.  Ultrahigh-resolution study on Pyrococcus abyssi rubredoxin. I. 0.69 A X-ray structure of mutant W4L/R5S. , 2005, Acta crystallographica. Section D, Biological crystallography.

[40]  R. Montange,et al.  Discrimination between closely related cellular metabolites by the SAM-I riboswitch. , 2010, Journal of molecular biology.

[41]  Rhiju Das,et al.  Correcting pervasive errors in RNA crystallography through enumerative structure prediction , 2011, Nature Methods.

[42]  Jack Snoeyink,et al.  Nucleic Acids Research Advance Access published April 22, 2007 MolProbity: all-atom contacts and structure validation for proteins and nucleic acids , 2007 .

[43]  William T. Simpson,et al.  Coherent X‐Ray Scattering for the Hydrogen Atom in the Hydrogen Molecule , 1965 .

[44]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[45]  A. Brunger Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. , 1992 .

[46]  A. Gavezzotti,et al.  The calculation of molecular volumes and the use of volume analysis in the investigation of structured media and of solid-state organic reactivity , 1983 .

[47]  L. S. Bartell,et al.  Equilibrium Bond Lengths in Methane and Deuteromethane as Determined by Electron Diffraction and Spectroscopic Methods , 1960 .

[48]  Conrad C. Huang,et al.  UCSF ChimeraX: Meeting modern challenges in visualization and analysis , 2018, Protein science : a publication of the Protein Society.

[49]  Dietmar Schomburg,et al.  Crystal structure and snapshots along the reaction pathway of a family 51 α‐L‐arabinofuranosidase , 2003 .

[50]  T. Steitz,et al.  The roles of ribosomal proteins in the structure assembly, and evolution of the large ribosomal subunit. , 2004, Journal of molecular biology.

[51]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[52]  T. A. Jones,et al.  The Uppsala Electron-Density Server. , 2004, Acta crystallographica. Section D, Biological crystallography.

[53]  W. B. Arendall,et al.  RNA backbone is rotameric , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[54]  David Baker,et al.  Advances, interactions, and future developments in the CNS, Phenix, and Rosetta structural biology software systems. , 2013, Annual review of biophysics.

[55]  F. Escudero,et al.  Atoms in molecules , 1982 .

[56]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[57]  A. Brünger Free R value: a novel statistical quantity for assessing the accuracy of crystal structures , 1992, Nature.

[58]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[59]  Dietmar Schomburg,et al.  Crystal structure and snapshots along the reaction pathway of a family 51 alpha-L-arabinofuranosidase. , 2003, The EMBO journal.

[60]  P A Frey,et al.  The Low Barrier Hydrogen Bond in Enzymatic Catalysis* , 1998, The Journal of Biological Chemistry.

[61]  Christopher J. Williams,et al.  The other 90% of the protein: Assessment beyond the Cαs for CASP8 template‐based and high‐accuracy models , 2009, Proteins.

[62]  Randy J. Read,et al.  Graphical tools for macromolecular crystallography in PHENIX , 2012, Journal of applied crystallography.

[63]  L. S. Bartell,et al.  Erratum: Representations of molecular force fields. V. On the equilibrium structure of methane , 1978 .

[64]  P. Karplus,et al.  Conformation‐dependent backbone geometry restraints set a new standard for protein crystallographic refinement , 2014, The FEBS journal.

[65]  G. Montelione,et al.  Recommendations of the wwPDB NMR Validation Task Force. , 2013, Structure.

[66]  Roland L. Dunbrack,et al.  Conformation dependence of backbone geometry in proteins. , 2009, Structure.

[67]  Nathan Nelson,et al.  The structure of a plant photosystem I supercomplex at 3.4 Å resolution , 2007, Nature.

[68]  Ad Bax,et al.  Determination of Relative N−HN, N−C‘, Cα−C‘, and Cα−Hα Effective Bond Lengths in a Protein by NMR in a Dilute Liquid Crystalline Phase , 1998 .

[69]  G J Kleywegt,et al.  Efficient rebuilding of protein structures. , 1996, Acta crystallographica. Section D, Biological crystallography.

[70]  Steven M Lewis,et al.  Molprobity's ultimate rotamer‐library distributions for model validation , 2016, Proteins.