Homology-based loop modelling yields more complete crystallographic protein structures

Inherent protein flexibility, poor or low-resolution diffraction data, or poor electron density maps, often inhibit building complete structural models during X-ray structure determination. However, advances in crystallographic refinement and model building nowadays often allow to complete previously missing parts. Here, we present algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and “graft” these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in our PDB-REDO pipeline, allowed to build 24,962 missing loops in the PDB. The models and the automated procedures are publically available through the PDB-REDO databank and web server (https://pdb-redo.eu). More complete protein structure models enable a higher quality public archive, but also a better understanding of protein function, better comparison between homologous structures, and more complete data mining in structural bioinformatics projects. Synopsis Thousands of missing regions in existing protein structure models are completed using new methods based on homology.

[1]  N. Pannu,et al.  REFMAC5 for the refinement of macromolecular crystal structures , 2011, Acta crystallographica. Section D, Biological crystallography.

[2]  Krista Joosten,et al.  PDB_REDO: constructive validation, more than just looking for errors , 2012, Acta crystallographica. Section D, Biological crystallography.

[3]  Jinquan Luo,et al.  Homology‐based hydrogen bond information improves crystallographic structures in the PDB , 2017, bioRxiv.

[4]  Mark A Depristo,et al.  Crystallographic refinement by knowledge-based exploration of complex energy landscapes. , 2005, Structure.

[5]  A. Keith Dunker,et al.  Intrinsic Disorder in the Protein Data Bank , 2007, Journal of biomolecular structure & dynamics.

[6]  Fei Long,et al.  The PDB_REDO server for macromolecular structure model optimization , 2014, IUCrJ.

[7]  Masafumi Hidaka,et al.  Role of a PA14 domain in determining substrate specificity of a glycoside hydrolase family 3 β-glucosidase from Kluyveromyces marxianus. , 2010, The Biochemical journal.

[8]  Haruki Nakamura,et al.  Data Deposition and Annotation at the Worldwide Protein Data Bank , 2009, Molecular biotechnology.

[9]  Thomas C. Terwilliger,et al.  Continuous mutual improvement of macromolecular structure models in the PDB and of X-ray crystallographic software: the dual role of deposited experimental data , 2014, Acta crystallographica. Section D, Biological crystallography.

[10]  F. E. Grubbs Sample Criteria for Testing Outlying Observations , 1950 .

[11]  T A Jones,et al.  Errors and reproducibility in electron-density map interpretation. , 1999, Acta crystallographica. Section D, Biological crystallography.

[12]  Audrey L Lamb,et al.  You are lost without a map: Navigating the sea of protein structures. , 2015, Biochimica et biophysica acta.

[13]  Ian J. Tickle,et al.  Statistical quality indicators for electron-density maps , 2012, Acta crystallographica. Section D, Biological crystallography.

[14]  Haruki Nakamura,et al.  The Protein Data Bank archive as an open data resource , 2014, Journal of Computer-Aided Molecular Design.

[15]  Anastassis Perrakis,et al.  Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank , 2011, Bioinform..

[16]  Kevin Cowtan,et al.  Completion of autobuilt protein models using a database of protein fragments , 2012, Acta crystallographica. Section D, Biological crystallography.

[17]  Vladimir N Uversky,et al.  Resolving the ambiguity: Making sense of intrinsic disorder when PDB structures disagree , 2016, Protein science : a publication of the Protein Society.

[18]  Bernhard Rupp,et al.  Visualizing ligand molecules in Twilight electron density. , 2013, Acta crystallographica. Section F, Structural biology and crystallization communications.

[19]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[20]  Wladek Minor,et al.  Protein crystallography for aspiring crystallographers or how to avoid pitfalls and traps in macromolecular structure determination , 2013, The FEBS journal.

[21]  Bart van Beusekom,et al.  Data Mining of Macromolecular Structures. , 2016, Methods in molecular biology.

[22]  Randy J. Read,et al.  Dauter Iterative model building , structure refinement and density modification with the PHENIX AutoBuild wizard , 2007 .

[23]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[24]  George N Phillips,et al.  Ensemble refinement of protein crystal structures: validation and application. , 2007, Structure.

[25]  J. Kuipers Quaternions and Rotation Sequences , 1998 .

[26]  Paul D Adams,et al.  Modelling dynamics in protein crystal structures by ensemble refinement , 2012, eLife.

[27]  Jaap Heringa,et al.  SEQATOMS: a web tool for identifying missing regions in PDB in sequence context , 2008, Nucleic Acids Res..

[28]  Krista Joosten,et al.  A knowledge-driven approach for crystallographic protein model completion , 2008, Acta crystallographica. Section D, Biological crystallography.

[29]  T. A. Jones,et al.  Databases in protein crystallography. , 1998, Acta crystallographica. Section D, Biological crystallography.

[30]  Michael G Prisant,et al.  Crystallographic model validation: from diagnosis to healing. , 2013, Current opinion in structural biology.

[31]  O. Carugo,et al.  Missing strings of residues in protein crystal structures , 2015, Intrinsically disordered proteins.

[32]  J M Carazo,et al.  3DBIONOTES: A unified, enriched and interactive view of macromolecular information. , 2016, Journal of structural biology.

[33]  Chris Sander,et al.  Objectively judging the quality of a protein structure from a Ramachandran plot , 1997, Comput. Appl. Biosci..

[34]  Adam Godzik,et al.  Between order and disorder in protein structures: analysis of "dual personality" fragments in proteins. , 2007, Structure.

[35]  Z. Jia,et al.  Modulator of drug activity B from Escherichia coli: crystal structure of a prokaryotic homologue of DT-diaphorase. , 2006, Journal of molecular biology.

[36]  Haruki Nakamura,et al.  Remediation of the protein data bank archive , 2007, Nucleic Acids Res..

[37]  B. Santarsiero,et al.  Structure of the Adenovirus Type 4 (Species E) E3-19K/HLA-A2 Complex Reveals Species-Specific Features in MHC Class I Recognition , 2016, The Journal of Immunology.

[38]  M. S. Kim,et al.  Contribution of the hydrogen-bond network involving a tyrosine triad in the active site to the structure and function of a highly proficient ketosteroid isomerase from Pseudomonas putida biotype B. , 2000, Biochemistry.

[39]  Anthony Nicholls,et al.  Essential considerations for using protein-ligand structures in drug discovery. , 2012, Drug discovery today.

[40]  José María Carazo,et al.  3DBIONOTES v2.0: a web server for the automatic annotation of macromolecular structures , 2017, Bioinform..

[41]  Adrià Cereto-Massagué,et al.  The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites , 2013, Journal of Cheminformatics.

[42]  Haruki Nakamura,et al.  Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. , 2017, Methods in molecular biology.