ProtaBank: A repository for protein design and engineering data

We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user‐friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org.

[1]  Motonori Ota,et al.  The Protein Mutant Database , 1999, Nucleic Acids Res..

[2]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[3]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[4]  Timothy A. Whitehead,et al.  Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing , 2012, Nature Biotechnology.

[5]  A M Gronenborn,et al.  Core mutants of the immunoglobulin binding domain of streptococcal protein G: Stability and structural integrity , 1996, FEBS letters.

[6]  Nicholas C. Wu,et al.  A Comprehensive Biophysical Description of Pairwise Epistasis throughout an Entire Protein Domain , 2014, Current Biology.

[7]  M. Michael Gromiha,et al.  PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics , 2017, Bioinform..

[8]  Stephen L. Mayo,et al.  Design, structure and stability of a hyperthermophilic protein variant , 1998, Nature Structural Biology.

[9]  J. Warwicker,et al.  Simplified methods for pKa and acid pH‐dependent stability estimation in proteins: Removing dielectric and counterion boundaries , 2008, Protein science : a publication of the Protein Society.

[10]  H. Roder,et al.  An early intermediate in the folding reaction of the B1 domain of protein G contains a native-like core. , 1997, Biochemistry.

[11]  P S Kim,et al.  Context is a major determinant of beta-sheet propensity. , 1994, Nature.

[12]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[13]  Nicholas C Tang,et al.  Parallel on-chip gene synthesis and application to optimization of protein expression , 2011, Nature Biotechnology.

[14]  A. Gronenborn,et al.  Structural and dynamic characterization of the urea denatured state of the immunoglobulin binding domain of streptococcal protein G by multidimensional heteronuclear NMR spectroscopy , 1995, Protein science : a publication of the Protein Society.

[15]  Timothy A. Whitehead,et al.  Deep sequencing methods for protein engineering and design. , 2017, Current opinion in structural biology.

[16]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[17]  Thomas L. Madden,et al.  The BLAST Sequence Analysis Tool , 2013 .

[18]  L Serrano,et al.  A tale of two secondary structure elements: when a beta-hairpin becomes an alpha-helix. , 1999, Journal of molecular biology.

[19]  Fanny Sunden,et al.  High-throughput analysis and protein engineering using microcapillary arrays , 2015, Nature chemical biology.

[20]  Eun Jung Choi,et al.  Generation and analysis of proline mutants in protein G. , 2006, Protein engineering, design & selection : PEDS.

[21]  Bonnie E. Shook-Sa,et al.  . CC-BY-NC-ND 4 . 0 International licenseIt is made available under a is the author / funder , who has granted medRxiv a license to display the preprint in perpetuity , 2021 .

[22]  G J Kleywegt,et al.  Crystal structure of the C2 fragment of streptococcal protein G in complex with the Fc domain of human IgG. , 1995, Structure.

[23]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[24]  Antje Chang,et al.  BRENDA in 2017: new perspectives and new tools in BRENDA , 2016, Nucleic Acids Res..

[25]  Maria M. Reif,et al.  Stability of proteins: temperature, pressure and the role of the solvent. , 2005, Biochimica et biophysica acta.

[26]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[27]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[28]  L Regan,et al.  A thermodynamic scale for the beta-sheet forming tendencies of the amino acids. , 1994, Biochemistry.

[29]  James R. Apgar,et al.  AB‐Bind: Antibody binding mutational database for computational affinity predictions , 2016, Protein science : a publication of the Protein Society.

[30]  James O Lloyd-Smith,et al.  Adaptation in protein fitness landscapes is facilitated by indirect paths , 2016, bioRxiv.

[31]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[32]  J. Poulain,et al.  Capturing the mutational landscape of the beta-lactamase TEM-1 , 2013, Proceedings of the National Academy of Sciences.

[33]  Adam M Damry,et al.  Rational design of proteins that exchange on functional timescales. , 2017, Nature chemical biology.

[34]  Emil Alexov,et al.  On the pH‐optimum of activity and stability of proteins , 2010, Proteins.

[35]  D. Bolon,et al.  Experimental illumination of a fitness landscape , 2011, Proceedings of the National Academy of Sciences.

[36]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[37]  A. Gronenborn,et al.  Fast folding of a prototypic polypeptide: The immunoglobulin binding domain of streptococcal protein G , 1994, Protein science : a publication of the Protein Society.

[38]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[39]  K. Gekko,et al.  Mechanism of protein stabilization by glycerol: preferential hydration in glycerol-water mixtures. , 1981, Biochemistry.

[40]  Martin Karplus,et al.  pH-Dependence of Protein Stability: Absolute Electrostatic Free Energy Differences between Conformations† , 1997 .

[41]  Philip A. Romero,et al.  Dissecting enzyme function with microfluidic-based deep mutational scanning , 2015, Proceedings of the National Academy of Sciences.

[42]  J M Masson,et al.  Crystal structure of Escherichia coli TEM1 β‐lactamase at 1.8 Å resolution , 1993, Proteins.

[43]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[44]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[45]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[46]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..