RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education

The Protein Data Bank (PDB) is one of two archival resources for experimental data central to biomedical research and education worldwide (the other key Primary Data Archive in biology being the International Nucleotide Sequence Database Collaboration). The PDB currently houses >134,000 atomic level biomolecular structures determined by crystallography, NMR spectroscopy, and 3D electron microscopy. It was established in 1971 as the first open‐access, digital‐data resource in biology, and is managed by the Worldwide Protein Data Bank partnership (wwPDB; wwpdb.org). US PDB operations are conducted by the RCSB Protein Data Bank (RCSB PDB; RCSB.org; Rutgers University and UC San Diego) and funded by NSF, NIH, and DoE. The RCSB PDB serves as the global Archive Keeper for the wwPDB. During calendar 2016, >591 million structure data files were downloaded from the PDB by Data Consumers working in every sovereign nation recognized by the United Nations. During this same period, the RCSB PDB processed >5300 new atomic level biomolecular structures plus experimental data and metadata coming into the archive from Data Depositors working in the Americas and Oceania. In addition, RCSB PDB served >1 million RCSB.org users worldwide with PDB data integrated with ∼40 external data resources providing rich structural views of fundamental biology, biomedicine, and energy sciences, and >600,000 PDB101.rcsb.org educational website users around the globe. RCSB PDB resources are described in detail together with metrics documenting the impact of access to PDB data on basic and applied research, clinical medicine, education, and the economy.

[1]  Andrej Sali,et al.  Integrative Structural Biology , 2013, Science.

[2]  Louis H. Sullivan,et al.  The Tall Office Building Artistically Considered , 2012 .

[3]  M. Baker,et al.  Outcome of the First Electron Microscopy Validation Task Force Meeting , 2012, Structure.

[4]  Philip E. Bourne,et al.  Structural Bioinformatics: Bourne/Structural Bioinformatics , 2005 .

[5]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Naohiro Kobayashi,et al.  OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. , 2017, Structure.

[7]  Charles E. Cook,et al.  Identifying ELIXIR Core Data Resources , 2016, F1000Research.

[8]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[9]  Huanwang Yang,et al.  Multivariate Analyses of Quality Metrics for Crystal Structures in the PDB Archive. , 2017, Structure.

[11]  James Barber,et al.  Architecture of the Photosynthetic Oxygen-Evolving Center , 2004, Science.

[12]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[13]  Yi-Hung Huang,et al.  Citing a Data Repository: A Case Study of the Protein Data Bank , 2015, PloS one.

[14]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[15]  Wallace Wurth,et al.  Fundamentals of Biochemistry: , 1936, Nature.

[16]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures , 2016, Nucleic Acids Res..

[17]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[18]  G. Montelione,et al.  Recommendations of the wwPDB NMR Validation Task Force. , 2013, Structure.

[19]  N. Nelson,et al.  Structure and energy transfer in photosystems of oxygenic photosynthesis. , 2015, Annual review of biochemistry.

[20]  Andreas Prlic,et al.  Detection of circular permutations within protein structures using CE-CP , 2015, Bioinform..

[21]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[22]  D. Blow,et al.  The detection of sub‐units within the crystallographic asymmetric unit , 1962 .

[23]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[24]  R. Mariuzza,et al.  Structural Basis for Recognition of Cellular and Viral Ligands by NK Cell Receptors , 2014, Front. Immunol..

[25]  Kei Yura,et al.  [Structural bioinformatics]. , 2009, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[26]  Haruki Nakamura,et al.  Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. , 2015, Structure.

[27]  Andreas Prlic,et al.  Pre-calculated protein structure alignments at the RCSB PDB website , 2010, Bioinform..

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  Seung Joong Kim,et al.  Structural Characterization by Cross-linking Reveals the Detailed Architecture of a Coatomer-related Heptameric Module from the Nuclear Pore Complex* , 2014, Molecular & Cellular Proteomics.

[30]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[31]  Donna Neuberg,et al.  Characterization of AMN107, a selective inhibitor of native and mutant Bcr-Abl. , 2005, Cancer cell.

[32]  Zhen Zhang,et al.  Systems biology of the structural proteome , 2016, BMC Systems Biology.

[33]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[34]  E. Reinherz,et al.  Strict Major Histocompatibility Complex Molecule Class-Specific Binding by Co-Receptors Enforces MHC-Restricted αβ TCR Recognition during T Lineage Subset Commitment , 2013, Front. Immunol..

[35]  David S. Goodsell,et al.  The RCSB PDB “Molecule of the Month”: Inspiring a Molecular View of Biology , 2015, PLoS biology.

[36]  Haruki Nakamura,et al.  Remediation of the protein data bank archive , 2007, Nucleic Acids Res..

[37]  Jian-Ren Shen,et al.  The Structure of Photosystem II and the Mechanism of Water Oxidation in Photosynthesis. , 2015, Annual review of plant biology.

[38]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[39]  T. Richmond,et al.  Crystal structure of the nucleosome core particle at 2.8 Å resolution , 1997, Nature.

[40]  Genji Kurisu,et al.  PDB-Dev: a Prototype System for Depositing Integrative/Hybrid Structural Models. , 2017, Structure.

[41]  Abhik Mukhopadhyay,et al.  PDBe: improved accessibility of macromolecular structure data from PDB and EMDB , 2015, Nucleic Acids Res..

[42]  S. R. Hall,et al.  International Tables for Crystallography: Definition and exchange of crystallographic data , 2006 .

[43]  F. Crick,et al.  Molecular structure of nucleic acids , 2004, JAMA.

[44]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[45]  Katrin Stierand,et al.  Drawing the PDB: Protein-Ligand Complexes in Two Dimensions. , 2010, ACS medicinal chemistry letters.

[46]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[47]  Richard Van Noorden,et al.  The top 100 papers , 2014, Nature.

[48]  Kevin P. Sullivan,et al.  Economic Impacts of the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank , 2017 .

[49]  M. Wittekind,et al.  The structure of Dasatinib (BMS-354825) bound to activated ABL kinase domain elucidates its inhibitory activity against imatinib-resistant ABL mutants. , 2006, Cancer research.

[50]  Genji Kurisu,et al.  Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data , 2018, Database J. Biol. Databases Curation.

[51]  John D. Westbrook,et al.  Representation of viruses in the remediated PDB archive , 2008, Acta crystallographica. Section D, Biological crystallography.

[52]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[53]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[54]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[55]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[56]  Betsy L. Humphreys,et al.  Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study , 2015, PloS one.

[57]  S. Almo,et al.  Sequence, structure, function, immunity: structural genomics of costimulation , 2009, Immunological reviews.

[58]  Anton Barty,et al.  Structures of riboswitch RNA reaction states by mix-and-inject XFEL serial crystallography , 2017 .

[59]  M. A. Saper,et al.  Structure of the human class I histocompatibility antigen, HLA-A2 , 1987, Nature.

[60]  Naohiro Kobayashi,et al.  Validation of Structures in the Protein Data Bank , 2017, Structure.

[61]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[62]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[63]  Karolin Luger,et al.  Nucleosome structure(s) and stability: variations on a theme. , 2011, Annual review of biophysics.