An approach to creating a more realistic working model from a protein data bank entry

An accurate model of three-dimensional protein structure is important in a variety of fields such as structure-based drug design and mechanistic studies of enzymatic reactions. While the entries in the Protein Data Bank (http://www.pdb.org) provide valuable information about protein structures, a small fraction of the PDB structures were found to contain anomalies not reported in the PDB file. The semiempirical PM7 method in MOPAC2012 was used for identifying anomalously short hydrogen bonds, C–H⋯O/C–H⋯N interactions, non-bonding close contacts, and unrealistic covalent bond lengths in recently published Protein Data Bank files. It was also used to generate new structures with these faults removed. When the semiempirical models were compared to those of PDB_REDO (http://www.cmbi.ru.nl/pdb_redo/), the clashscores, as defined by MolProbity (http://molprobity.biochem.duke.edu/), were better in about 50 % of the structures. The semiempirical models also had a lower root-mean-square-deviation value in nearly all cases than those from PDB_REDO, indicative of a better conservation of the tertiary structure. Finally, the semiempirical models were found to have lower clashscores than the initial PDB file in all but one case. Because this approach maintains as much of the original tertiary structure as possible while improving anomalous interactions, it should be useful to theoreticians, experimentalists, and crystallographers investigating the structure and function of proteins.

[1]  A. Leach Molecular Modelling: Principles and Applications , 1996 .

[2]  Ashwini Nangia,et al.  Conformational polymorphism in organic crystals. , 2008, Accounts of chemical research.

[3]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[4]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[5]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[6]  J. Nielsen,et al.  The pKa Cooperative: A collaborative effort to advance structure‐based calculations of pKa values and electrostatic effects in proteins , 2011, Proteins.

[7]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[8]  S. Ramakumar,et al.  The occurrence of CH…O hydrogen bonds in α‐helices and helix termini in globular proteins , 2004, Proteins.

[9]  T. Jones,et al.  Between objectivity and subjectivity , 1990, Nature.

[10]  Brendan Borrell,et al.  Fraud rocks protein community , 2009, Nature.

[11]  R. Kretsinger,et al.  Refinement of the structure of carp muscle calcium-binding parvalbumin by model building and difference Fourier analysis. , 1976, Journal of molecular biology.

[12]  Anastassis Perrakis,et al.  Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank , 2011, Bioinform..

[13]  Arieh Warshel,et al.  Simulating electrostatic energies in proteins: Perspectives and some recent studies of pKas, redox, and other crucial functional properties , 2011, Proteins.

[14]  V Lamzin,et al.  Accurate protein crystallography at ultra-high resolution: valence electron distribution in crambin. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Randy J. Read,et al.  Crystallography: Crystallographic evidence for deviating C3b structure , 2007, Nature.

[16]  Robert Huber,et al.  Structure quality and target parameters , 2006 .

[17]  Charles L Brooks,et al.  Recent advances in implicit solvent-based methods for biomolecular simulations. , 2008, Current opinion in structural biology.

[18]  B. García-Moreno E.,et al.  Structural interpretation of pH and salt-dependent processes in proteins with computational methods. , 2004, Methods in enzymology.

[19]  Randy J. Read,et al.  Acta Crystallographica Section D Biological , 2003 .

[20]  J. Shankar,et al.  PRINCIPLES OF PROTEIN CRYSTALLIZATION , 2013 .

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[23]  Saraswathi Vishveshwara,et al.  Short hydrogen bonds in proteins , 2005, The FEBS journal.

[24]  G. Murshudov,et al.  Refinement of macromolecular structures by the maximum-likelihood method. , 1997, Acta crystallographica. Section D, Biological crystallography.

[25]  Kenneth M Merz,et al.  Refinement of protein crystal structures using energy restraints derived from linear-scaling quantum mechanics. , 2005, Acta crystallographica. Section D, Biological crystallography.

[26]  Vincent Breton,et al.  PDB_REDO: automated re-refinement of X-ray structure models in the PDB , 2009, Journal of applied crystallography.

[27]  Ar Lang,et al.  International Tables for Crystallography, Vol C , 1992 .

[28]  James J. P. Stewart,et al.  Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters , 2012, Journal of Molecular Modeling.

[29]  F. Allen The Cambridge Structural Database: a quarter of a million crystal structures and rising. , 2002, Acta crystallographica. Section B, Structural science.

[30]  G. Desiraju The C-H×××O Hydrogen Bond: Structural Implications and Supramolecular Design , 1996 .

[31]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[32]  Gerard J Kleywegt,et al.  Limitations and lessons in the use of X-ray structural information in drug design , 2008, Drug Discovery Today.

[33]  G. Chang,et al.  Structure of the ABC Transporter MsbA in Complex with ADP·Vanadate and Lipopolysaccharide , 2005, Science.

[34]  Olga Kennard,et al.  Systematic analysis of structural data as a research technique in organic chemistry , 1983 .

[35]  T. Hahn International tables for crystallography , 2002 .

[36]  Raymond C Stevens,et al.  Retraction: Cocrystal structure of synaptobrevin-II bound to botulinum neurotoxin type B at 2.0 Å resolution , 2009, Nature Structural &Molecular Biology.

[37]  Kurt Wüthrich,et al.  NMR analysis of a 900K GroEL–GroES complex , 2002, Nature.