Capturing atomic interactions with a graphical framework in computational protein design

A protein's amino acid sequence determines both its chemical and its physical structures, and together these two structures determine its function. Protein designers seek new amino acid sequences with chemical and physical structures capable of performing some function. The vast size of sequence space frustrates efforts to find useful sequences. Protein designers model proteins on computers and search through amino acid sequence space computationally. They represent the three-dimensional structures for the sequences they examine, specifying the location of each atom, and evaluate the stability of these structures. Good structures are tightly packed but are free of collisions. Designers seek a sequence with a stable structure that meets the geometric and chemical requirements to function as desired; they frame their search as an optimization problem. In this dissertation, I present a graphical model of the central optimization problem in protein design, the side-chain-placement problem. This model allows the formulation of a dynamic programming solution, thus connecting side-chain placement with the class of NP-complete problems for which certain instances admit polynomial time solutions. Moreover, the graphical model suggests a natural data structure for storing the energies used in design. With this data structure, I have created an extensible framework for the representation of energies during side-chain-placement optimization and have incorporated this framework into the Rosetta molecular modeling program. I present one extension that incorporates a new degree of structural variability into the optimization process. I present another extension that includes a non-pairwise decomposable energy function, the first of its kind in protein design, laying the ground-work to capture aspects of protein stability that could not previously be incorporated into the optimization of side-chain placement.

[1]  D Eisenberg,et al.  Crystal structure of a synthetic triple-stranded alpha-helical bundle. , 1993, Science.

[2]  P. S. Kim,et al.  High-resolution protein design with backbone freedom. , 1998, Science.

[3]  D A Agard,et al.  Computational method for the design of enzymes with altered substrate specificity. , 1991, Journal of molecular biology.

[4]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[5]  W. DeGrado,et al.  Design of a 4-helix bundle protein: synthesis of peptides which self-associate into a helical protein , 1987 .

[6]  Loren L Looger,et al.  Computational Design of a Biologically Active Enzyme , 2004, Science.

[7]  C. Chothia,et al.  The structure of protein-protein recognition sites. , 1990, The Journal of biological chemistry.

[8]  T. Bhat,et al.  An analysis of side-chain conformation in proteins. , 2009, International journal of peptide and protein research.

[9]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[10]  P. Koehl,et al.  Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. , 1994, Journal of molecular biology.

[11]  Michael H. Hecht,et al.  Protein Design: The Choice of de Novo Sequences* , 1997, The Journal of Biological Chemistry.

[12]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[13]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[14]  Dinesh Manocha,et al.  Efficient inverse kinematics for general 6R manipulators , 1994, IEEE Trans. Robotics Autom..

[15]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[16]  D C Richardson,et al.  Looking at proteins: representations, folding, packing, and design. Biophysical Society National Lecture, 1992. , 1992, Biophysical journal.

[17]  J. Richardson,et al.  Corrections: Amino Acid Preferences for Specific Locations at the Ends of α Helices , 1988 .

[18]  B. Matthews,et al.  The role of backbone flexibility in the accommodation of variants that repack the core of T4 lysozyme. , 1994, Science.

[19]  Nick V Grishin,et al.  Effective scoring function for protein sequence design , 2003, Proteins.

[20]  K E Drexler,et al.  Molecular engineering: An approach to the development of general capabilities for molecular manipulation. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[21]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[22]  D. Baker,et al.  A simple physical model for the prediction and design of protein-DNA interactions. , 2004, Journal of molecular biology.

[23]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[24]  M. Karplus,et al.  Deformable stochastic boundaries in molecular dynamics , 1983 .

[25]  J R Desjarlais,et al.  De novo design of the hydrophobic cores of proteins , 1995, Protein science : a publication of the Protein Society.

[26]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[27]  S. L. Mayo,et al.  Protein design automation , 1996, Protein science : a publication of the Protein Society.

[28]  David Baker,et al.  Ca2+ indicators based on computationally redesigned calmodulin-peptide pairs. , 2006, Chemistry & biology.

[29]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[30]  D. Baker,et al.  A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. , 2003, Journal of molecular biology.

[31]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[32]  Sathesh Bhat,et al.  Molecular surface generation using a variable‐radius solvent probe , 2005, Proteins.

[33]  Frederick P. Brooks,et al.  Computing smooth molecular surfaces , 1994, IEEE Computer Graphics and Applications.

[34]  B. W. Erickson,et al.  Designed coiled-coil proteins: synthesis and spectroscopy of two 78-residue alpha-helical dimers. , 1991, Biochemistry.

[35]  W. Kullmann Design, synthesis, and binding characteristics of an opiate receptor mimetic peptide. , 1984, Journal of medicinal chemistry.

[36]  Navin Pokala,et al.  Energy functions for protein design I: Efficient and accurate continuum electrostatics and solvation , 2004, Protein science : a publication of the Protein Society.

[37]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[38]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[39]  W. Braun,et al.  Rapid calculation of first and second derivatives of conformational energy with respect to dihedral angles for proteins general recurrent equations , 1984, Comput. Chem..

[40]  C. Pabo Molecular technology: Designing proteins and peptides , 1983, Nature.

[41]  J. Apostolakis,et al.  Evaluation of a fast implicit solvent model for molecular dynamics simulations , 2002, Proteins.

[42]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[43]  Stephen L. Mayo,et al.  Dramatic performance enhancements for the FASTER optimization algorithm , 2006, J. Comput. Chem..

[44]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[45]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[46]  L Serrano,et al.  Design of a 20-amino acid, three-stranded beta-sheet protein. , 1998, Science.

[47]  Gordon M. Crippen,et al.  Distance Geometry and Molecular Conformation , 1988 .

[48]  A. Roche,et al.  Organic Chemistry: , 1982, Nature.

[49]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[50]  M Karplus,et al.  Solvent effects on protein motion and protein effects on solvent motion. Dynamics of the active site region of lysozyme. , 1989, Journal of molecular biology.

[51]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[52]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[53]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[54]  C. Sander,et al.  Fast and simple monte carlo algorithm for side chain optimization in proteins: Application to model building by homology , 1992, Proteins.

[55]  V. Muñoz,et al.  Intrinsic secondary structure propensities of the amino acids, using statistical ϕ–ψ matrices: Comparison with experimental scales , 1994 .

[56]  T. Richmond,et al.  Solvent accessible surface area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. , 1984, Journal of molecular biology.

[57]  Jeffery G. Saven,et al.  STATISTICAL MECHANICS OF THE COMBINATORIAL SYNTHESIS AND ANALYSIS OF FOLDING MACROMOLECULES , 1997 .

[58]  S. L. Mayo,et al.  Probing the role of packing specificity in protein design. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[59]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[60]  W. DeGrado,et al.  Protein design, a minimalist approach. , 1989, Science.

[61]  I. Lasters,et al.  The fuzzy-end elimination theorem: correctly implementing the side chain placement algorithm based on the dead-end elimination theorem. , 1993, Protein engineering.

[62]  J. Kuriyan,et al.  Binding of a high affinity phosphotyrosyl peptide to the Src SH2 domain: Crystal structures of the complexed and peptide-free forms , 1993, Cell.

[63]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[64]  Detlef Seese,et al.  Easy Problems for Tree-Decomposable Graphs , 1991, J. Algorithms.

[65]  O. Schueler‐Furman,et al.  Improved side‐chain modeling for protein–protein docking , 2005, Protein science : a publication of the Protein Society.

[66]  G. A. Lazar,et al.  Solution structure and dynamics of a designed hydrophobic core variant of ubiquitin. , 1999, Structure.

[67]  M. L. Connolly Solvent-accessible surfaces of proteins and nucleic acids. , 1983, Science.

[68]  Alfonso Jaramillo,et al.  Computational protein design is a challenge for implicit solvation models. , 2005, Biophysical journal.

[69]  S L Mayo,et al.  De novo protein design: towards fully automated sequence selection. , 1997, Journal of molecular biology.

[70]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[71]  D. Benjamin Gordon,et al.  Radical performance enhancements for combinatorial optimization algorithms based on the dead-end elimination theorem , 1998, Journal of Computational Chemistry.

[72]  I Lasters,et al.  Enhanced dead-end elimination in the search for the global minimum energy conformation of a collection of protein side chains. , 1995, Protein engineering.

[73]  I Lasters,et al.  All in one: a highly detailed rotamer library improves both accuracy and speed in the modelling of sidechains by dead-end elimination. , 1997, Folding & design.

[74]  Engineering of betabellin 14D: disulfide-induced folding of a beta-sheet protein. , 1994, Protein science : a publication of the Protein Society.

[75]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[76]  W. C. Still,et al.  Approximate solvent-accessible surface areas from tetrahedrally directed neighbor densities. , 1999, Biopolymers.

[77]  L. H. Bradley,et al.  Protein design by binary patterning of polar and nonpolar amino acids. , 1993, Methods in molecular biology.

[78]  Jens Meiler,et al.  CASP6 assessment of contact prediction , 2005, Proteins.

[79]  Solution structure of a de novo helical protein by 2D-NMR spectroscopy. , 1994, Journal of molecular biology.

[80]  W. Lim,et al.  The role of internal packing interactions in determining the structure and stability of a protein. , 1991, Journal of molecular biology.

[81]  K. Dill,et al.  Molecular driving forces : statistical thermodynamics in chemistry and biology , 2002 .

[82]  Ton Kloks,et al.  Efficient and Constructive Algorithms for the Pathwidth and Treewidth of Graphs , 1993, J. Algorithms.

[83]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[84]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[85]  W. DeGrado,et al.  A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. , 1990, Science.

[86]  F. Richards,et al.  Construction of new ligand binding sites in proteins of known structure. I. Computer-aided modeling of sites with pre-defined geometry. , 1991, Journal of molecular biology.

[87]  D Eisenberg,et al.  The design, synthesis, and crystallization of an alpha‐helical peptide , 1986, Proteins.

[88]  P S Kim,et al.  Repacking protein cores with backbone freedom: structure prediction for coiled coils. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[89]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[90]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[91]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[92]  E A Merritt,et al.  Raster3D: photorealistic molecular graphics. , 1997, Methods in enzymology.

[93]  Arne Elofsson,et al.  Side Chain-Positioning as an Integer Programming Problem , 2001, WABI.

[94]  Charles L. Brooks,et al.  Efficient approximate all‐atom solvent accessible surface area method parameterized for folded and denatured protein conformations , 2004, J. Comput. Chem..

[95]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[96]  William R. Taylor,et al.  Analysis and prediction of the packing of α-helices against a β-sheet in the tertiary structure of globular proteins , 1982 .

[97]  Tao Jiang,et al.  Automated assignment of backbone NMR peaks using constrained bipartite matching , 2002, Comput. Sci. Eng..

[98]  J. Hoch,et al.  Characterization of the structural properties of .alpha.1B, a peptide designed to form a four-helix bundle , 1992 .

[99]  B. Gutte,et al.  Design, synthesis and characterisation of a 34-residue polypeptide that interacts with nucleic acids , 1979, Nature.

[100]  D. Baker,et al.  Computational redesign of endonuclease DNA binding and cleavage specificity , 2006, Nature.

[101]  Bernd Gutte,et al.  An artificial crystalline DDT‐binding polypeptide , 1983 .

[102]  S. Arnborg,et al.  Characterization and recognition of partial 3-trees , 1986 .

[103]  D B Gordon,et al.  Branch-and-terminate: a combinatorial optimization algorithm for protein design. , 1999, Structure.

[104]  G. Rose,et al.  Do all backbone polar groups in proteins form hydrogen bonds? , 2005, Protein science : a publication of the Protein Society.

[105]  B. Kuhlman,et al.  Computational design of a single amino acid sequence that can switch between two distinct protein folds. , 2006, Journal of the American Chemical Society.

[106]  Kam Y. J. Zhang,et al.  Accurate computer-based design of a new backbone conformation in the second turn of protein L. , 2002, Journal of molecular biology.

[107]  Jack Snoeyink,et al.  Probik: Protein Backbone Motion by Inverse Kinematics , 2005, WAFR.

[108]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[109]  Jan Hermans,et al.  Boltzmann‐type distribution of side‐chain conformation in proteins , 2003, Protein science : a publication of the Protein Society.

[110]  S L Mayo,et al.  Pairwise calculation of protein solvent-accessible surface areas. , 1998, Folding & design.

[111]  Alexander D. MacKerell,et al.  All-atom empirical potential for molecular modeling and dynamics studies of proteins. , 1998, The journal of physical chemistry. B.

[112]  J. Richardson,et al.  Betadoublet: de novo design, synthesis, and characterization of a beta-sandwich protein. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[113]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[114]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[115]  Bonnie Berger,et al.  A tree-decomposition approach to protein structure prediction , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[116]  P. S. Kim,et al.  A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. , 1993, Science.

[117]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[118]  E. Baker,et al.  Hydrogen bonding in globular proteins. , 1984, Progress in biophysics and molecular biology.

[119]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[120]  D. Raleigh,et al.  A de Novo Designed Protein Mimics the Native State of Natural Proteins , 1995 .

[121]  Drexler Ke,et al.  Molecular engineering: An approach to the development of general capabilities for molecular manipulation. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[122]  S. Bryant,et al.  The frequency of ion‐pair substructures in proteins is quantitatively related to electrostatic potential: A statistical model for nonbonded interactions , 1991, Proteins.

[123]  Ned S Wingreen,et al.  Fast accurate evaluation of protein solvent exposure , 2004, Proteins.

[124]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[125]  Jack Snoeyink,et al.  Rotamer-Pair Energy Calculations Using a Trie Data Structure , 2005, WABI.

[126]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[127]  L. Regan,et al.  Characterization of a helical protein designed from first principles. , 1988, Science.

[128]  W. DeGrado,et al.  Synthetic amphiphilic peptide models for protein ion channels. , 1988, Science.

[129]  L. Looger,et al.  Computational design of receptor and sensor proteins with novel functions , 2003, Nature.

[130]  Dan Halperin,et al.  Improved Maintenance of Molecular Surfaces Using Dynamic Graph Connectivity , 2005, WABI.

[131]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[132]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[133]  L L Looger,et al.  Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. , 2001, Journal of molecular biology.

[134]  Hans L. Bodlaender,et al.  Dynamic Programming on Graphs with Bounded Treewidth , 1988, ICALP.

[135]  P. Kollman,et al.  An all atom force field for simulations of proteins and nucleic acids , 1986, Journal of computational chemistry.

[136]  D. Baker,et al.  Computational design of a new hydrogen bond network and at least a 300-fold specificity switch at a protein-protein interface. , 2006, Journal of molecular biology.

[137]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[138]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[139]  D. Richardson,et al.  Exploring steric constraints on protein mutations using MAGE/PROBE , 2000, Protein science : a publication of the Protein Society.

[140]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[141]  Christopher Bystroff,et al.  MASKER: improved solvent-excluded molecular surface area estimations using Boolean masks. , 2002, Protein engineering.

[142]  Werner Braun,et al.  Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules , 1998 .

[143]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[144]  Stefan Arnborg,et al.  Linear time algorithms for NP-hard problems restricted to partial k-trees , 1989, Discret. Appl. Math..

[145]  F M Richards,et al.  Optimal sequence selection in proteins of known structure by simulated evolution. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[146]  Ken A. Dill,et al.  A Simple Model of Water and the Hydrophobic Effect , 1998 .

[147]  Stephen L Mayo,et al.  Repacking the Core of T4 lysozyme by automated design. , 2003, Journal of molecular biology.

[148]  Jack Snoeyink,et al.  An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem , 2004, Pacific Symposium on Biocomputing.

[149]  A. Pardi,et al.  Determination of DNA structures by NMR and distance geometry techniques: a computer simulation. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[150]  Kenneth M. Merz,et al.  Rapid approximation to molecular surface area via the use of Boolean logic and look‐up tables , 1993, J. Comput. Chem..

[151]  M Karplus,et al.  Active site dynamics of ribonuclease. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[152]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[153]  David Baker,et al.  Recapitulation and design of protein binding peptide structures and sequences. , 2006, Journal of molecular biology.

[154]  B. Matthews,et al.  Design and structural analysis of alternative hydrophobic core packing arrangements in bacteriophage T4 lysozyme. , 1993, Journal of molecular biology.

[155]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[156]  J. Denavit,et al.  A kinematic notation for lower pair mechanisms based on matrices , 1955 .