Novel algorithms and benchmarks for computational protein design

Author(s): Ollikainen, Noah | Advisor(s): Kortemme, Tanja | Abstract: Computational protein design aims to predict protein sequences that will fold into a given three-dimensional structure and perform a desired function. Though significant accomplishments in computational protein design have been achieved in the past several years, including the design of novel enzymes and protein-protein interactions, the accuracy of computational protein design is relatively low and many designed sequences must be experimentally tested in order to obtain a successful design. Moreover, successful designs often require directed evolution to achieve catalytic activities or binding affinities similar to naturally occurring proteins. A major challenge in computational protein design that limits its accuracy is the inability to sufficiently sample protein sequence and conformational space at a high resolution. Sampling is difficult due to the combinatorially large number of possible protein sequences and the inherent flexibility of the protein backbone, which may change its conformation upon changes in sequence. To address the issue of sampling a large number of sequences, I developed a deterministic computational protein design algorithm that identifies all sequences within a given energy of the global minimum energy sequence. To identify an accurate method of modeling backbone flexibility, I created a benchmark that evaluates designed sequences based on their similarity to natural sequences with respect to amino acid covariation. Lastly, I developed a novel method of coupling side-chain and backbone sampling. I applied this method to re-designing enzyme substrate specificity and showed a substantial improvement in accuracy over previous computational protein design methods. Taken together, these results demonstrate the importance of modeling protein backbone flexibility and provide new tools to enable higher accuracy computational protein design.

[1]  Gary D. Bader,et al.  Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing. , 2010, Molecular bioSystems.

[2]  Tanja Kortemme,et al.  Designing ensembles in conformational and sequence space to characterize and engineer proteins. , 2010, Current opinion in structural biology.

[3]  Tanja Kortemme,et al.  Assessment of flexible backbone protein design methods for sequence library prediction in the therapeutic antibody Herceptin–HER2 interface , 2011, Protein science : a publication of the Protein Society.

[4]  Tanja Kortemme,et al.  Backbone flexibility in computational protein design. , 2009, Current opinion in biotechnology.

[5]  B. Stoddard,et al.  Random mutagenesis and selection of Escherichia coli cytosine deaminase for cancer gene therapy. , 2004, Protein engineering, design & selection : PEDS.

[6]  M. Levitt,et al.  Simulating protein evolution in sequence and structure space. , 2004, Current opinion in structural biology.

[7]  Kateri H. DuBay,et al.  Calculation of proteins' total side-chain torsional entropy and its influence on protein-ligand interactions. , 2009, Journal of molecular biology.

[8]  D. A. Bosco,et al.  Enzyme Dynamics During Catalysis , 2002, Science.

[9]  A T Brünger,et al.  Influence of internal dynamics on accuracy of protein NMR structures: derivation of realistic model distance data from a long molecular dynamics trajectory. , 1999, Journal of molecular biology.

[10]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[11]  François Stricher,et al.  How Protein Stability and New Functions Trade Off , 2008, PLoS Comput. Biol..

[12]  J Meiler,et al.  Model-free approach to the dynamic interpretation of residual dipolar couplings in globular proteins. , 2001, Journal of the American Chemical Society.

[13]  Tanja Kortemme,et al.  RosettaBackrub—a web server for flexible backbone protein structure modeling and design , 2010, Nucleic Acids Res..

[14]  Jens Meiler,et al.  A Correspondence Between Solution-State Dynamics of an Individual Protein and the Sequence and Conformational Diversity of its Family , 2009, PLoS Comput. Biol..

[15]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[16]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[17]  Philip Bradley,et al.  Structure‐based prediction of protein–peptide specificity in rosetta , 2010, Proteins.

[18]  Michael Nilges,et al.  Materials and Methods Som Text Figs. S1 to S6 References Movies S1 to S5 Inferential Structure Determination , 2022 .

[19]  A J Wand,et al.  Insights into the local residual entropy of proteins provided by NMR relaxation , 1996, Protein science : a publication of the Protein Society.

[20]  Tanja Kortemme,et al.  Design of Multi-Specificity in Protein Interfaces , 2007, PLoS Comput. Biol..

[21]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[22]  D. Kern,et al.  Hidden alternate structures of proline isomerase essential for catalysis , 2010 .

[23]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[24]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[25]  Judith P Klinman,et al.  A 21st century revisionist's view at a turning point in enzymology. , 2009, Nature chemical biology.

[26]  Sachdev S Sidhu,et al.  Studying binding specificities of peptide recognition modules by high-throughput phage display selections. , 2011, Methods in molecular biology.

[27]  D. Baker,et al.  A simple physical model for the prediction and design of protein-DNA interactions. , 2004, Journal of molecular biology.

[28]  Michael T. Laub,et al.  Rewiring the Specificity of Two-Component Signal Transduction Systems , 2008, Cell.

[29]  Stefan M. Larson,et al.  Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. , 2000, Journal of molecular biology.

[30]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[31]  E. Shakhnovich,et al.  Understanding hierarchical protein evolution from first principles. , 2001, Journal of molecular biology.

[32]  Joost Schymkowitz,et al.  The stability effects of protein mutations appear to be universally distributed. , 2007, Journal of molecular biology.

[33]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[34]  Patrice Koehl,et al.  Protein topology and stability define the space of allowed sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  R. Anand,et al.  Designer gene therapy using an Escherichia coli purine nucleoside phosphorylase/prodrug system. , 2003, Chemistry & biology.

[36]  George N Phillips,et al.  Ensemble refinement of protein crystal structures: validation and application. , 2007, Structure.

[37]  Jeffrey Philip Obbard,et al.  Recent advances in the bioremediation of persistent organic pollutants via biomolecular engineering , 2005 .

[38]  D. Baker,et al.  Alternate states of proteins revealed by detailed energy landscape mapping. , 2011, Journal of molecular biology.

[39]  Ankur Dhanik,et al.  Modeling discrete heterogeneity in X-ray diffraction data by fitting multi-conformers. , 2009, Acta crystallographica. Section D, Biological crystallography.

[40]  Colin A. Smith,et al.  Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. , 2008, Journal of molecular biology.

[41]  Christopher T. Saunders,et al.  Recapitulation of protein family divergence using flexible backbone protein design. , 2005, Journal of molecular biology.

[42]  Dmitry M Korzhnev,et al.  A Transient and Low-Populated Protein-Folding Intermediate at Atomic Resolution , 2010, Science.

[43]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[44]  Stephen L Mayo,et al.  Computationally designed libraries of fluorescent proteins evaluated by preservation and diversity of function , 2007, Proceedings of the National Academy of Sciences.

[45]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[46]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Terence Hwa,et al.  Features of protein-protein interactions in two-component signaling deduced from genomic libraries. , 2007, Methods in enzymology.

[48]  P. Wolynes,et al.  The energy landscapes and motions of proteins. , 1991, Science.

[49]  Alberto Marina,et al.  Structural Insight into Partner Specificity and Phosphoryl Transfer in Two-Component Signal Transduction , 2009, Cell.

[50]  Ora Schueler-Furman,et al.  Identification of a Novel Class of Farnesylation Targets by Structure-Based Modeling of Binding Specificity , 2011, PLoS Comput. Biol..

[51]  Eyal Akiva,et al.  Prediction of Mutational Tolerance in HIV-1 Protease and Reverse Transcriptase Using Flexible Backbone Protein Design , 2012, PLoS Comput. Biol..

[52]  Tanja Kortemme,et al.  Flexible backbone sampling methods to model and design protein alternative conformations. , 2013, Methods in enzymology.

[53]  Tanja Kortemme,et al.  Multi‐constraint computational design suggests that native sequences of germline antibody H3 loops are nearly optimal for conformational flexibility , 2009, Proteins.

[54]  Vijay S Pande,et al.  Thoroughly sampling sequence space: Large‐scale protein design of structural ensembles , 2002, Protein science : a publication of the Protein Society.

[55]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[56]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[57]  Tanja Kortemme,et al.  Structure-based prediction of the peptide sequence space recognized by natural and synthetic PDZ domains. , 2010, Journal of molecular biology.

[58]  Kristala L J Prather,et al.  Engineering enzyme specificity using computational design of a defined-sequence library. , 2010, Chemistry & biology.

[59]  Amy E Keating,et al.  Predictive Bcl-2 family binding models rooted in experiment or structure. , 2012, Journal of molecular biology.

[60]  Colin A. Smith,et al.  A simple model of backbone flexibility improves modeling of side-chain conformational variability. , 2008, Journal of molecular biology.

[61]  M. DePristo,et al.  Is one solution good enough? , 2006, Nature Structural &Molecular Biology.

[62]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[63]  Scott Banta,et al.  Structural alteration of cofactor specificity in Corynebacterium 2,5‐diketo‐D‐gluconic acid reductase , 2004, Protein science : a publication of the Protein Society.

[64]  Jens Meiler,et al.  RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite , 2011, PloS one.

[65]  Matthew P Jacobson,et al.  Assessment of protein structure refinement in CASP9 , 2011, Proteins.

[66]  H. Ng,et al.  Automated electron‐density sampling reveals widespread conformational polymorphism in proteins , 2010, Protein science : a publication of the Protein Society.

[67]  M. DePristo,et al.  Relation between native ensembles and experimental structures of proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[68]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[69]  Nathaniel Echols,et al.  Accessing protein conformational ensembles using room-temperature X-ray crystallography , 2011, Proceedings of the National Academy of Sciences.

[70]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[71]  Vijay S Pande,et al.  Simulating oligomerization at experimental concentrations and long timescales: A Markov state model approach. , 2008, The Journal of chemical physics.

[72]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[73]  Tanja Kortemme,et al.  Control of protein signaling using a computationally designed GTPase/GEF orthogonal pair , 2012, Proceedings of the National Academy of Sciences.

[74]  Claus O. Wilke,et al.  Mistranslation-Induced Protein Misfolding as a Dominant Constraint on Coding-Sequence Evolution , 2008, Cell.

[75]  Gregory B. Gloor,et al.  Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation , 2010, PloS one.

[76]  Elisabeth L. Humphris,et al.  Prediction of protein-protein interface sequence diversity using flexible backbone computational protein design. , 2008, Structure.

[77]  R. Friesner,et al.  Novel procedure for modeling ligand/receptor induced fit effects. , 2006, Journal of medicinal chemistry.

[78]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[79]  Oliver F. Lange,et al.  Solution structure of a minor and transiently formed state of a T4 lysozyme mutant , 2011, Nature.

[80]  Brian Kuhlman,et al.  Protein design simulations suggest that side‐chain conformational entropy is not a strong determinant of amino acid environmental preferences , 2005, Proteins.

[81]  Ian W. Davis,et al.  The backrub motion: how protein backbone shrugs when a sidechain dances. , 2006, Structure.

[82]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[83]  David Baker,et al.  Role of the Biomolecular Energy Gap in Protein Design, Structure, and Evolution , 2012, Cell.

[84]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[85]  L. Kay,et al.  Intrinsic dynamics of an enzyme underlies catalysis , 2005, Nature.

[86]  Pablo Gainza,et al.  Osprey: Protein Design with Ensembles, Flexibility, and Provable Algorithms , 2022 .

[87]  R. Nussinov,et al.  The role of dynamic conformational ensembles in biomolecular recognition. , 2009, Nature chemical biology.

[88]  Donald Hamelberg,et al.  Resolving the complex role of enzyme conformational dynamics in catalytic function , 2012, Proceedings of the National Academy of Sciences.

[89]  Colin A. Smith,et al.  Predicting the Tolerated Sequences for Proteins and Protein Interfaces Using RosettaBackrub Flexible Backbone Design , 2011, PloS one.