Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data

Abstract The Protein Data Bank (PDB) is the single global repository for experimentally determined 3D structures of biological macromolecules and their complexes with ligands. The worldwide PDB (wwPDB) is the international collaboration that manages the PDB archive according to the FAIR principles: Findability, Accessibility, Interoperability and Reusability. The wwPDB recently developed OneDep, a unified tool for deposition, validation and biocuration of structures of biological macromolecules. All data deposited to the PDB undergo critical review by wwPDB Biocurators. This article outlines the importance of biocuration for structural biology data deposited to the PDB and describes wwPDB biocuration processes and the role of expert Biocurators in sustaining a high-quality archive. Structural data submitted to the PDB are examined for self-consistency, standardized using controlled vocabularies, cross-referenced with other biological data resources and validated for scientific/technical accuracy. We illustrate how biocuration is integral to PDB data archiving, as it facilitates accurate, consistent and comprehensive representation of biological structure data, allowing efficient and effective usage by research scientists, educators, students and the curious public worldwide. Database URL: https://www.wwpdb.org/

Genji Kurisu | Haruki Nakamura | Zukang Feng | John D. Westbrook | Helen M. Berman | Abhik Mukhopadhyay | Sameer Velankar | Swanand P. Gore | Huanwang Yang | Ezra Peisach | Gerard J. Kleywegt | Ardan Patwardhan | Chenghua Shao | Stephen K. Burley | John L. Markley | Li Chen | Luigi Di Costanzo | Sutapa Ghosh | Vladimir Guranovic | Brian P. Hudson | Raul Sala | Monica Sekharan | Lihua Tan | Eduardo Sanz-García | Dimitris Dimitropoulos | Junko Sato | Guanghua Gao | Jasmine Young | Irina Persikova | John M. Berrisford | Yasuyo Ikegawa | Minyu Chen | Catherine L. Lawson | Sanchayita Sen | Yumiko Kengaku | Yuhe Liang | Buvaneswari Coimbatore Narayanan | Gaurav Sahni | Marina Zhuravleva | Jawahar Swaminathan | Thomas J. Oldfield | Alice R. Clark | Aleksandras Gutmanas | Glen van Ginkel | A. R. Clark | David R. Armstrong | Lora Mak | Oliver S. Smart | Reiko Igarashi | Kumaran Baskaran | Pieter M. S. Hendrickx | Kayoko Nishiyama | H. Berman | G. Kleywegt | J. Westbrook | Haruki Nakamura | S. Burley | A. Patwardhan | L. Mak | S. Velankar | Zukang Feng | J. Markley | D. Dimitropoulos | Huanwang Yang | Jasmine Y. Young | C. Lawson | T. Oldfield | Li Chen | Kumaran Baskaran | E. Peisach | S. Sen | A. Gutmanas | S. Gore | O. Smart | L. Costanzo | E. Sanz-García | G. Kurisu | Sutapa Ghosh | B. Hudson | A. Mukhopadhyay | C. Shao | Irina Persikova | P. Hendrickx | Yuhe Liang | J. Berrisford | Jawahar Swaminathan | G. V. Ginkel | G. Sahni | Yasuyo Ikegawa | B. Narayanan | Monica Sekharan | Vladimir Guranovic | Raul Sala | M. Zhuravleva | Kayo Nishiyama | Yumiko Kengaku | Junko Sato | Lihua Tan | Reiko Igarashi | Minyu Chen | Guanghua Gao

[1]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[2]  Zukang Feng,et al.  The use of mmCIF architecture for PDB data management , 2006 .

[3]  Naohiro Kobayashi,et al.  OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. , 2017, Structure.

[4]  John D. Westbrook,et al.  Representation of viruses in the remediated PDB archive , 2008, Acta crystallographica. Section D, Biological crystallography.

[5]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[6]  Dietmar Schomburg,et al.  Atomic resolution structures of R-specific alcohol dehydrogenase from Lactobacillus brevis provide the structural bases of its substrate and cosubstrate specificity. , 2005, Journal of molecular biology.

[7]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[8]  Haruki Nakamura,et al.  Remediation of the protein data bank archive , 2007, Nucleic Acids Res..

[9]  Zukang Feng,et al.  Improving the representation of peptide-like inhibitor and antibiotic molecules in the Protein Data Bank , 2014, Biopolymers.

[10]  Zukang Feng,et al.  Chemical annotation of small and peptide-like molecules at the Protein Data Bank , 2013, Database J. Biol. Databases Curation.

[11]  M. Baker,et al.  Outcome of the First Electron Microscopy Validation Task Force Meeting , 2012, Structure.

[12]  Zukang Feng,et al.  The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank , 2015, Bioinform..

[13]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format , 2011, Nucleic Acids Res..

[14]  Brian McMahon,et al.  Definition and exchange of crystallographic data , 2005 .

[15]  John D. Westbrook,et al.  EMDataBank unified data resource for 3DEM , 2013, Nucleic Acids Res..

[16]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[17]  Chenghua Shao,et al.  Crystallographic Analysis of Calcium-dependent Heparin Binding to Annexin A2* , 2006, Journal of Biological Chemistry.

[18]  Ian J. Tickle,et al.  Statistical quality indicators for electron-density maps , 2012, Acta crystallographica. Section D, Biological crystallography.

[19]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[20]  Naohiro Kobayashi,et al.  Validation of Structures in the Protein Data Bank , 2017, Structure.

[21]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[22]  G. Montelione,et al.  Recommendations of the wwPDB NMR Validation Task Force. , 2013, Structure.

[23]  Andrej Sali,et al.  Integrative Structural Biology , 2013, Science.

[24]  Huanwang Yang,et al.  Multivariate Analyses of Quality Metrics for Crystal Structures in the PDB Archive. , 2017, Structure.

[25]  Abhik Mukhopadhyay,et al.  PDBe: improved accessibility of macromolecular structure data from PDB and EMDB , 2015, Nucleic Acids Res..

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[28]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[29]  Abhik Mukhopadhyay,et al.  Small molecule annotation for the Protein Data Bank , 2014, Database J. Biol. Databases Curation.

[30]  J. Zou,et al.  Improved methods for building protein models in electron density maps and the location of errors in these models. , 1991, Acta crystallographica. Section A, Foundations of crystallography.

[31]  T. Hahn International tables for crystallography , 2002 .

[32]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[33]  Bernhard Rupp,et al.  Visualizing ligand molecules in Twilight electron density. , 2013, Acta crystallographica. Section F, Structural biology and crystallization communications.

[34]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.