Protein Data Bank: the single global archive for 3D macromolecular structure data

Abstract The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data of biological macromolecules. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination of 3D macromolecular structure data. The PDB Core Archive houses 3D atomic coordinates of more than 144 000 structural models of proteins, DNA/RNA, and their complexes with metals and small molecules and related experimental data and metadata. Structure and experimental data/metadata are also stored in the PDB Core Archive using the readily extensible wwPDB PDBx/mmCIF master data format, which will continue to evolve as data/metadata from new experimental techniques and structure determination methods are incorporated by the wwPDB. Impacts of the recently developed universal wwPDB OneDep deposition/validation/biocuration system and various methods-specific wwPDB Validation Task Forces on improving the quality of structures and data housed in the PDB Core Archive are described together with current challenges and future plans.

[1]  John L. Markley,et al.  STAR/CIF macromolecular NMR data dictionaries and data file formats , 1996 .

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  Philip E. Bourne,et al.  STAR/mmCIF: An ontology for macromolecular structure , 2000, Bioinform..

[4]  K. Henrick,et al.  New electron microscopy database and deposition system. , 2002, Trends in biochemical sciences.

[5]  Jun Zhu,et al.  BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank , 2003, Journal of biomolecular NMR.

[6]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[7]  N. O. Manning,et al.  The protein data bank , 1999, Genetica.

[8]  Haruki Nakamura,et al.  PDBML: the representation of archival macromolecular structure data in XML , 2005, Bioinform..

[9]  Philip E. Bourne,et al.  Macromolecular dictionary (mmCIF) , 2006 .

[10]  Haruki Nakamura,et al.  Remediation of the protein data bank archive , 2007, Nucleic Acids Res..

[11]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[12]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[13]  Sameer Velankar,et al.  Implementing an X-ray validation pipeline for the Protein Data Bank , 2012, Acta crystallographica. Section D, Biological crystallography.

[14]  M. Baker,et al.  Outcome of the First Electron Microscopy Validation Task Force Meeting , 2012, Structure.

[15]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format , 2011, Nucleic Acids Res..

[16]  Jill Trewhella,et al.  Report of the wwPDB Small-Angle Scattering Task Force: data requirements for biomolecular modeling and the PDB. , 2013, Structure.

[17]  Zukang Feng,et al.  Chemical annotation of small and peptide-like molecules at the Protein Data Bank , 2013, Database J. Biol. Databases Curation.

[18]  G. Montelione,et al.  Recommendations of the wwPDB NMR Validation Task Force. , 2013, Structure.

[19]  Piotr Sliz,et al.  Collaboration gets the most out of software , 2013, eLife.

[20]  Frank H. Allen,et al.  The Cambridge Structural Database in retrospect and prospect. , 2014, Angewandte Chemie.

[21]  Zukang Feng,et al.  The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank , 2015, Bioinform..

[22]  Haruki Nakamura,et al.  Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. , 2015, Structure.

[23]  Michael Nilges,et al.  NMR Exchange Format: a unified and open standard for representation of NMR restraint data , 2015, Nature Structural &Molecular Biology.

[24]  Dmitri I. Svergun,et al.  SASBDB, a repository for biological small-angle scattering data , 2014, Nucleic Acids Res..

[25]  Haruki Nakamura,et al.  Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop. , 2016, Structure.

[26]  Wladek Minor,et al.  A public database of macromolecular diffraction experiments. , 2016, Acta Crystallographica Section D: Structural Biology.

[27]  Ardan Patwardhan,et al.  EMPIAR: a public archive for raw electron microscopy image data , 2016, Nature Methods.

[28]  Shuai Liu,et al.  D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies , 2017, Journal of Computer-Aided Molecular Design.

[29]  Genji Kurisu,et al.  PDB-Dev: a Prototype System for Depositing Integrative/Hybrid Structural Models. , 2017, Structure.

[30]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[31]  Naohiro Kobayashi,et al.  Validation of Structures in the Protein Data Bank , 2017, Structure.

[32]  Naohiro Kobayashi,et al.  OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. , 2017, Structure.

[33]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures , 2016, Nucleic Acids Res..

[34]  Helen M Berman,et al.  Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules. , 2018, Structure.

[35]  Alessandro Barbato,et al.  Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 , 2018, Proteins.

[36]  Ardan Patwardhan,et al.  EMDB Web Resources , 2018, Current protocols in bioinformatics.

[37]  Abhik Mukhopadhyay,et al.  PDBe: towards reusable data delivery infrastructure at protein data bank in Europe , 2017, Nucleic Acids Res..

[38]  Genji Kurisu,et al.  Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data , 2018, Database J. Biol. Databases Curation.

[39]  Sameer Velankar,et al.  Validation of ligands in macromolecular structures determined by X-ray crystallography , 2018, Acta crystallographica. Section D, Structural biology.

[40]  Anna Tramontano,et al.  Evaluation of the template‐based modeling in CASP12 , 2018, Proteins.

[41]  Sameer Velankar,et al.  The challenge of modeling protein assemblies: the CASP12‐CAPRI experiment , 2018, Proteins.