RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences

Abstract The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), the US data center for the global PDB archive and a founding member of the Worldwide Protein Data Bank partnership, serves tens of thousands of data depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without restrictions to millions of RCSB.org users around the world, including >660 000 educators, students and members of the curious public using PDB101.RCSB.org. PDB data depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy, 3D electron microscopy and micro-electron diffraction. PDB data consumers accessing our web portals include researchers, educators and students studying fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. During the past 2 years, the research-focused RCSB PDB web portal (RCSB.org) has undergone a complete redesign, enabling improved searching with full Boolean operator logic and more facile access to PDB data integrated with >40 external biodata resources. New features and resources are described in detail using examples that showcase recently released structures of SARS-CoV-2 proteins and host cell proteins relevant to understanding and addressing the COVID-19 global pandemic.

[1]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[2]  E. Liu,et al.  Company Says It Mapped Part of SARS Virus , 2003 .

[3]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[4]  M. Selmer,et al.  Structure of the 70S Ribosome Complexed with mRNA and tRNA , 2006, Science.

[5]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[6]  Qiang Zhou,et al.  Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 , 2020, Science.

[7]  Zukang Feng,et al.  The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank , 2015, Bioinform..

[8]  Z. Rao,et al.  Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis , 2019, Nucleic acids research.

[9]  Stephen K. Burley,et al.  Analysis of impact metrics for the Protein Data Bank , 2018, Scientific Data.

[10]  R. Guigó,et al.  Cell type–specific genetic regulation of gene expression across human tissues , 2020, Science.

[11]  D. Goodsell,et al.  Insights from 20 years of the Molecule of the Month , 2020, Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology.

[12]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[13]  Brian McMahon,et al.  Definition and exchange of crystallographic data , 2005 .

[14]  J. Harrow,et al.  A conditional knockout resource for the genome-wide study of mouse gene function , 2011, Nature.

[15]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[16]  Cole H. Christie,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2018, Nucleic Acids Res..

[17]  D. Goodsell,et al.  Impact of the Protein Data Bank Across Scientific Disciplines , 2020, Data Sci. J..

[18]  Sameer Velankar,et al.  Mol*: Towards a Common Library and Tools for Web Molecular Graphics , 2018, MolVa@EuroVis.

[19]  L. Guddat,et al.  Structure of the RNA-dependent RNA polymerase from COVID-19 virus , 2020, Science.

[20]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[21]  Haruki Nakamura,et al.  BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions , 2008, Journal of biomolecular NMR.

[22]  Hualiang Jiang,et al.  Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors , 2020, Nature.

[23]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[24]  B. Graham,et al.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation , 2020, Science.

[25]  Stephen K. Burley,et al.  Real-time structural motif searching in proteins using an inverted index strategy , 2020, bioRxiv.

[26]  Abhik Mukhopadhyay,et al.  PDBe: towards reusable data delivery infrastructure at protein data bank in Europe , 2017, Nucleic Acids Res..

[27]  A. Godzik,et al.  Crystal structure of RNA binding domain of nucleocapsid phosphoprotein from SARS coronavirus 2 , 2020 .

[28]  David S. Goodsell,et al.  The RCSB PDB “Molecule of the Month”: Inspiring a Molecular View of Biology , 2015, PLoS biology.

[29]  Rajarshi Guha,et al.  Pharos: Collating protein information to shed light on the druggable genome , 2016, Nucleic Acids Res..

[30]  J. Westbrook,et al.  Impact of the Protein Data Bank on antineoplastic approvals. , 2020, Drug discovery today.

[31]  J. DiMasi,et al.  Landscape of Innovation for Cardiovascular Pharmaceuticals: From Basic Science to New Molecular Entities. , 2017, Clinical therapeutics.

[32]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[35]  wwPDB consortium,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2019, Nucleic Acids Res..

[36]  S. Miyano,et al.  TOWARDS COORDINATED INTERNATIONAL SUPPORT OF CORE DATA RESOURCES FOR THE LIFE SCIENCES , 2017, bioRxiv.

[37]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[38]  Dmytro Guzenko,et al.  Real time structural search of the Protein Data Bank , 2020, PLoS computational biology.

[39]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[40]  A. Joachimiak,et al.  The crystal structure of papain-like protease of SARS CoV-2 , 2020 .

[41]  Klaus Schulten,et al.  Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics , 2013, Nature.

[42]  Chenghua Shao,et al.  RCSB Protein Data Bank: Enabling biomedical research and drug discovery , 2019, Protein science : a publication of the Protein Society.

[43]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[44]  Philip E. Bourne,et al.  The RCSB PDB information portal for structural genomics , 2005, Nucleic Acids Res..

[45]  David S. Goodsell,et al.  CellPAINT: Interactive Illustration of Dynamic Mesoscale Cellular Environments , 2018, IEEE Computer Graphics and Applications.

[46]  D. Goodsell,et al.  Integrative illustration for coronavirus outreach , 2020, PLoS biology.

[47]  I. Wilson,et al.  A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV , 2020, Science.

[48]  Benjamin J. Polacco,et al.  A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing , 2020, Nature.

[49]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[50]  David A. Lee,et al.  CATH: an expanded resource to predict protein function through structure and sequence , 2016, Nucleic Acids Res..

[51]  Kiyoko F. Aoki-Kinoshita,et al.  Implementation of GlycanBuilder to draw a wide variety of ambiguous glycans. , 2017, Carbohydrate research.

[52]  Naohiro Kobayashi,et al.  OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. , 2017, Structure.

[53]  David S. Goodsell,et al.  The RCSB Protein Data Bank: redesigned web site and web services , 2010, Nucleic Acids Res..

[54]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[55]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[56]  S. Perlman,et al.  3C-like protease inhibitors block coronavirus replication in vitro and improve survival in MERS-CoV–infected mice , 2020, Science Translational Medicine.

[57]  Philip E. Bourne,et al.  The distribution and query systems of the RCSB Protein Data Bank , 2004, Nucleic Acids Res..

[58]  Hirofumi Suzuki,et al.  New tools and functions in data‐out activities at Protein Data Bank Japan (PDBj) , 2017, Protein science : a publication of the Protein Society.

[59]  H. Berg Cold Spring Harbor Symposia on Quantitative Biology.: Vol. LII. Evolution of Catalytic Functions. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1987, ISBN 0-87969-054-2, xix + 955 pp., US $150.00. , 1989 .

[60]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[61]  T. N. Bhat,et al.  The PDB data uniformity project , 2001, Nucleic Acids Res..

[62]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[63]  M. Rossmann,et al.  Structure of faustovirus, a large dsDNA virus , 2016, Proceedings of the National Academy of Sciences.

[64]  Naohiro Kobayashi,et al.  Validation of Structures in the Protein Data Bank , 2017, Structure.

[65]  Crystal structure of Nsp15 endoribonuclease NendoU from SARS‐CoV‐2 , 2020, Protein science : a publication of the Protein Society.

[66]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[67]  Sameer Velankar,et al.  BinaryCIF and CIFTools—Lightweight, efficient and extensible macromolecular data management , 2020, PLoS Comput. Biol..

[68]  J. Westbrook,et al.  How Structural Biologists and the Protein Data Bank Contributed to Recent FDA New Drug Approvals. , 2019, Structure.

[69]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[70]  T. N. Bhat,et al.  The Protein Data Bank: unifying the archive , 2002, Nucleic Acids Res..

[71]  Genji Kurisu,et al.  Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data , 2018, Database J. Biol. Databases Curation.

[72]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.