Fast gap‐free enumeration of conformations and sequences for protein design

Despite significant successes in structure‐based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest‐energy structures and sequences are found. DEE/A*‐based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap‐free list of low‐energy protein conformations, which is necessary for ensemble‐based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*‐based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs. Proteins 2015; 83:1859–1877. © 2015 Wiley Periodicals, Inc.

[1]  D. Lauffenburger,et al.  Rational cytokine design for increased lifetime and enhanced potency using pH-activated “histidine switching” , 2002, Nature Biotechnology.

[2]  Simon de Givry,et al.  Existential arc consistency: Getting closer to full arc consistency in weighted CSPs , 2005, IJCAI.

[3]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[4]  Bruce R Donald,et al.  Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. , 2007, Chemistry & biology.

[5]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[6]  Mark A Hallen,et al.  Dead‐end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility , 2013, Proteins.

[7]  Bruce R Donald,et al.  Predicting resistance mutations using protein design algorithms , 2010, Proceedings of the National Academy of Sciences.

[8]  Tomás Lozano-Pérez,et al.  Rotamer optimization for protein design through MAP estimation and problem‐size reduction , 2009, J. Comput. Chem..

[9]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[10]  B. Tidor,et al.  Design, synthesis, and biological and structural evaluations of novel HIV-1 protease inhibitors to combat drug resistance. , 2012, Journal of medicinal chemistry.

[11]  Pablo Gainza,et al.  Protein design algorithms predict viable resistance to an experimental antifolate , 2014, Proceedings of the National Academy of Sciences.

[12]  Pablo Gainza,et al.  Osprey: Protein Design with Ensembles, Flexibility, and Provable Algorithms , 2022 .

[13]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[14]  Arne Elofsson,et al.  Side Chain-Positioning as an Integer Programming Problem , 2001, WABI.

[15]  D. Benjamin Gordon,et al.  Radical performance enhancements for combinatorial optimization algorithms based on the dead-end elimination theorem , 1998, Journal of Computational Chemistry.

[16]  B. Tidor,et al.  Selection of horseradish peroxidase variants with enhanced enantioselectivity by yeast surface display. , 2007, Chemistry & biology.

[17]  David Applegate,et al.  Finding Cuts in the TSP (A preliminary report) , 1995 .

[18]  Young Do Kwon,et al.  Enhanced Potency of a Broadly Neutralizing HIV-1 Antibody In Vitro Improves Protection against Lentiviral Infection In Vivo , 2014, Journal of Virology.

[19]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[20]  Thorsten Koch,et al.  Branching rules revisited , 2005, Oper. Res. Lett..

[21]  Bruce Randall Donald,et al.  Computational Design of a PDZ Domain Peptide Inhibitor that Rescues CFTR Activity , 2012, PLoS Comput. Biol..

[22]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[23]  D. Raleigh,et al.  Rational and computational design of stabilized variants of cyanovirin-N that retain affinity and specificity for glycan ligands. , 2011, Biochemistry.

[24]  Martin C. Cooper,et al.  Optimal Soft Arc Consistency , 2007, IJCAI.

[25]  Mona Singh,et al.  A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies , 2004, INFORMS J. Comput..

[26]  Ernst Althaus,et al.  A combinatorial approach to protein docking with flexible side-chains , 2000, RECOMB '00.

[27]  Tuomas Sandholm,et al.  Information-theoretic approaches to branching in search , 2006, AAMAS '06.

[28]  Thomas L. Magnanti,et al.  Applied Mathematical Programming , 1977 .

[29]  Amy C. Anderson,et al.  Computational structure-based redesign of enzyme activity , 2009, Proceedings of the National Academy of Sciences.

[30]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[31]  Bruce Randall Donald,et al.  Algorithms in Structural Molecular Biology , 2011 .

[32]  Bruce Randall Donald,et al.  Dead-End Elimination with Backbone Flexibility , 2007, ISMB/ECCB.

[33]  Tanja Kortemme,et al.  Backbone flexibility in computational protein design. , 2009, Current opinion in biotechnology.

[34]  Bruce R Donald,et al.  Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme's mechanism and selectivity. , 2006, Biochemistry.

[35]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[36]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[37]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[39]  Alan K. Mackworth Consistency in Networks of Relations , 1977, Artif. Intell..

[40]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[41]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[42]  Yair Weiss,et al.  Linear Programming Relaxations and Belief Propagation - An Empirical Study , 2006, J. Mach. Learn. Res..

[43]  Javier Larrosa,et al.  Node and arc consistency in weighted CSP , 2002, AAAI/IAAI.

[44]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[45]  Gwo-Yu Chuang,et al.  Antibodies VRC01 and 10E8 Neutralize HIV-1 with High Breadth and Potency Even with Ig-Framework Regions Substantially Reverted to Germline , 2014, The Journal of Immunology.

[46]  Peter A. Kollman,et al.  AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules , 1995 .

[47]  Simon de Givry,et al.  A new framework for computational protein design through cost function network optimization , 2013, Bioinform..

[48]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[49]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[50]  Bruce Tidor,et al.  Rational design of new binding specificity by simultaneous mutagenesis of calmodulin and a target peptide. , 2006, Biochemistry.

[51]  Bracken M. King,et al.  Efficient Computation of Small-Molecule Configurational Binding Entropy and Free Energy Changes by Ensemble Enumeration , 2013, Journal of chemical theory and computation.

[52]  K. Takano ON SOLUTION OF , 1983 .

[53]  Tanja Kortemme,et al.  Assessment of flexible backbone protein design methods for sequence library prediction in the therapeutic antibody Herceptin–HER2 interface , 2011, Protein science : a publication of the Protein Society.

[54]  Stephen L. Mayo,et al.  Conformational splitting: A more powerful criterion for dead-end elimination , 2000, J. Comput. Chem..

[55]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[56]  Gevorg Grigoryan,et al.  De novo design of a transmembrane Zn2+-transporting four-helix bundle , 2014, Science.

[57]  Woody Sherman,et al.  Affinity enhancement of an in vivo matured therapeutic antibody using structure‐based computational design , 2006, Protein science : a publication of the Protein Society.

[58]  Thomas Schiex,et al.  Solving weighted CSP by maintaining arc consistency , 2004, Artif. Intell..

[59]  Bruce Tidor,et al.  Computational design of antibody-affinity improvement beyond in vivo maturation , 2007, Nature Biotechnology.

[60]  Laurence A. Wolsey,et al.  An elementary survey of general duality theory in mathematical programming , 1981, Math. Program..

[61]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[62]  James R. Apgar,et al.  Modeling backbone flexibility to achieve sequence diversity: the design of novel alpha-helical ligands for Bcl-xL. , 2007, Journal of molecular biology.

[63]  Bruce Randall Donald,et al.  BWM*: A Novel, Provable, Ensemble-Based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design , 2015, RECOMB.

[64]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[65]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  R. Abagyan,et al.  Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. , 1994, Journal of molecular biology.

[67]  Bruce Randall Donald,et al.  Algorithm for backrub motions in protein design , 2008, ISMB.

[68]  Y Li,et al.  Design of epitope-specific probes for sera analysis and antibody isolation , 2012, Retrovirology.

[69]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[70]  Bruce Randall Donald,et al.  Protein Design Using Continuous Rotamers , 2012, PLoS Comput. Biol..

[71]  T. Schiex Arc consistency for soft constraints , 2000, Artif. Intell..

[72]  D. Kern,et al.  Choreographing an enzyme's dance. , 2010, Current opinion in chemical biology.

[73]  Bruce Randall Donald,et al.  Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design , 2006, ISMB.

[74]  Gevorg Grigoryan,et al.  Design of protein-interaction specificity affords selective bZIP-binding peptides , 2009, Nature.