BBK* (Branch and Bound over K*): A Provable and Efficient Ensemble-Based Algorithm to Optimize Stability and Binding Affinity over Large Sequence Spaces

Protein design algorithms that compute binding affinity search for sequences with an energetically favorable free energy of binding. Recent work shows that the following design principles improve the biological accuracy of protein design: ensemble-based design and continuous conformational flexibility. Ensemble-based algorithms capture a measure of entropic contributions to binding affinity, \(K_a\). Designs using backbone flexibility and continuous side-chain flexibility better model conformational flexibility. A third design principle, provable guarantees of accuracy, ensures that an algorithm computes the best sequences defined by the input model (i.e. input structures, energy function, and allowed protein flexibility). However, previous provable methods that model ensembles and continuous flexibility are single-sequence algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of mutable residues. To address these computational challenges, we introduce a new protein design algorithm, \(BBK^*\), that retains all aforementioned design principles yet provably and efficiently computes the tightest-binding sequences. A key innovation of \(BBK^*\) is the multi-sequence (MS) bound: \(BBK^*\) efficiently computes a single provable upper bound to approximate \(K_a\) for a combinatorial number of sequences, and entirely avoids single-sequence computation for all provably suboptimal sequences. Thus, to our knowledge, \(BBK^*\) is the first provable, ensemble-based \(K_a\) algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that \(BBK^*\) finds the tightest binding sequences while approximating \(K_a\) for up to \(10^5\)-fold fewer sequences than exhaustive enumeration. Furthermore, for 51 protein-ligand design problems, \(BBK^*\) provably approximates \(K_a\) up to 1982-fold faster than the previous state-of-the-art iMinDEE/\(A^*\)/\(K^*\) algorithm. Therefore, \(BBK^*\) not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.

[1]  Tom L Blundell,et al.  Advantages of fine-grained side chain conformer libraries. , 2003, Protein engineering.

[2]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[3]  Bracken M. King,et al.  Efficient Computation of Small-Molecule Configurational Binding Entropy and Free Energy Changes by Ensemble Enumeration , 2013, Journal of chemical theory and computation.

[4]  Bruce Randall Donald,et al.  Algorithms in Structural Molecular Biology , 2011 .

[5]  Young Do Kwon,et al.  Enhanced Potency of a Broadly Neutralizing HIV-1 Antibody In Vitro Improves Protection against Lentiviral Infection In Vivo , 2014, Journal of Virology.

[6]  Bruce Randall Donald,et al.  Fast search algorithms for computational protein design , 2016, J. Comput. Chem..

[7]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[8]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[9]  Bruce Randall Donald,et al.  Computational Design of a PDZ Domain Peptide Inhibitor that Rescues CFTR Activity , 2012, PLoS Comput. Biol..

[10]  Pablo Gainza,et al.  OSPREY Predicts Resistance Mutations Using Positive and Negative Computational Protein Design. , 2017, Methods in molecular biology.

[11]  Y Li,et al.  Design of epitope-specific probes for sera analysis and antibody isolation , 2012, Retrovirology.

[12]  Sachdev S Sidhu,et al.  Comprehensive and Quantitative Mapping of Energy Landscapes for Protein-Protein Interactions by Rapid Combinatorial Scanning*♦ , 2006, Journal of Biological Chemistry.

[13]  Bruce R Donald,et al.  Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. , 2007, Chemistry & biology.

[14]  K. Sharp,et al.  Potential energy functions for protein design. , 2007, Current opinion in structural biology.

[15]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[16]  D. Baker,et al.  Restricted sidechain plasticity in the structures of native proteins and complexes , 2011, Protein science : a publication of the Protein Society.

[17]  Gevorg Grigoryan,et al.  Rapid search for tertiary fragments reveals protein sequence–structure relationships , 2015, Protein science : a publication of the Protein Society.

[18]  Thomas Schiex,et al.  Guaranteed Weighted Counting for Affinity Computation: Beyond Determinism and Structure , 2016, CP.

[19]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Pablo Gainza,et al.  Osprey: Protein Design with Ensembles, Flexibility, and Provable Algorithms , 2022 .

[21]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[22]  Mark A Hallen,et al.  Dead‐end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility , 2013, Proteins.

[23]  Bruce Randall Donald,et al.  A Novel Ensemble-Based Scoring and Search Algorithm for Protein Redesign and Its Application to Modify the Substrate Specificity of the Gramicidin Synthetase A Phenylalanine Adenylation Enzyme , 2005, J. Comput. Biol..

[24]  Pablo Gainza,et al.  Compact Representation of Continuous Energy Surfaces for More Efficient Protein Design. , 2015, Journal of chemical theory and computation.

[25]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[26]  Pablo Gainza,et al.  Fast gap‐free enumeration of conformations and sequences for protein design , 2015, Proteins.

[27]  Ivelin S. Georgiev,et al.  Novel Algorithms for Computational Protein Design, with Applications to Enzyme Redesign and Small-Molecule Inhibitor Design , 2009 .

[28]  Bruce Randall Donald,et al.  LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid-rotamer-like Efficiency , 2017, RECOMB.

[29]  Bruce Randall Donald,et al.  Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design , 2006, ISMB.

[30]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[31]  Tanja Kortemme,et al.  Coupling Protein Side-Chain and Backbone Flexibility Improves the Re-design of Protein-Ligand Specificity , 2015, PLoS Comput. Biol..

[32]  Bruce Randall Donald,et al.  comets (Constrained Optimization of Multistate Energies by Tree Search): A Provable and Efficient Protein Design Algorithm to Optimize Binding Affinity and Specificity with Respect to Sequence , 2016, J. Comput. Biol..

[33]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[34]  Bruce Randall Donald,et al.  Dead-End Elimination with Backbone Flexibility , 2007, ISMB/ECCB.

[35]  Hunter Nisonoff,et al.  Efficient Partition Function Estimation in Computational Protein Design: Probabalistic Guarantees and Characterization of a Novel Algorithm , 2015 .

[36]  M. Gilson,et al.  The statistical-thermodynamic basis for computation of binding affinities: a critical review. , 1997, Biophysical journal.

[37]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[38]  Bruce Randall Donald,et al.  Protein Design Using Continuous Rotamers , 2012, PLoS Comput. Biol..

[39]  Bonnie Berger,et al.  Fast and accurate algorithms for protein side-chain packing , 2006, JACM.

[40]  Gwo-Yu Chuang,et al.  Antibodies VRC01 and 10E8 Neutralize HIV-1 with High Breadth and Potency Even with Ig-Framework Regions Substantially Reverted to Germline , 2014, The Journal of Immunology.

[41]  Bruce Randall Donald,et al.  BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design , 2016, J. Comput. Biol..

[42]  Bruce R Donald,et al.  Improved energy bound accuracy enhances the efficiency of continuous protein design , 2015, Proteins.

[43]  Roberto D Lins,et al.  Polymorphisms in fibronectin binding protein A of Staphylococcus aureus are associated with infection of cardiovascular devices , 2011, Proceedings of the National Academy of Sciences.

[44]  Simon de Givry,et al.  A new framework for computational protein design through cost function network optimization , 2013, Bioinform..

[45]  Daniele Sciretti,et al.  Computational protein design with side‐chain conformational entropy , 2009, Proteins.

[46]  Jan F. Prins,et al.  SMD: visual steering of molecular dynamics for protein design , 1996 .

[47]  S. L. Mayo,et al.  Protein design automation , 1996, Protein science : a publication of the Protein Society.

[48]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[49]  Bruce R Donald,et al.  Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme's mechanism and selectivity. , 2006, Biochemistry.

[50]  Thomas Schiex,et al.  Guaranteed Discrete Energy Optimization on Large Protein Design Problems. , 2015, Journal of chemical theory and computation.

[51]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[52]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[53]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[54]  Amy C. Anderson,et al.  Supporting Information for “ Computational Structure-Based Redesign of Enzyme Activity ” , 2009 .

[55]  Pablo Gainza,et al.  Algorithms for protein design. , 2016, Current opinion in structural biology.

[56]  Bruce R Donald,et al.  Predicting resistance mutations using protein design algorithms , 2010, Proceedings of the National Academy of Sciences.

[57]  L. Jermutus,et al.  Concepts in antibody phage display. , 2002, Briefings in functional genomics & proteomics.

[58]  Bonnie Berger,et al.  iTreePack: Protein Complex Side-Chain Packing by Dual Decomposition , 2015, 1504.05467.

[59]  Elspeth F Garman,et al.  Crystal structures of fibronectin-binding sites from Staphylococcus aureus FnBPA in complex with fibronectin domains , 2008, Proceedings of the National Academy of Sciences.

[60]  Menachem Fromer,et al.  A computational framework to empower probabilistic protein design , 2008, ISMB.

[61]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[62]  Min Xia,et al.  Integrating symbolic and numeric techniques in atomic physics , 2001, Computing in Science & Engineering.

[63]  Fang Zheng,et al.  Most efficient cocaine hydrolase designed by virtual screening of transition states. , 2008, Journal of the American Chemical Society.

[64]  Pablo Gainza,et al.  Protein design algorithms predict viable resistance to an experimental antifolate , 2014, Proceedings of the National Academy of Sciences.