A new framework for computational protein design through cost function network optimization

MOTIVATION The main challenge for structure-based computational protein design (CPD) remains the combinatorial nature of the search space. Even in its simplest fixed-backbone formulation, CPD encompasses a computationally difficult NP-hard problem that prevents the exact exploration of complex systems defining large sequence-conformation spaces. RESULTS We present here a CPD framework, based on cost function network (CFN) solving, a recent exact combinatorial optimization technique, to efficiently handle highly complex combinatorial spaces encountered in various protein design problems. We show that the CFN-based approach is able to solve optimality a variety of complex designs that could often not be solved using a usual CPD-dedicated tool or state-of-the-art exact operations research tools. Beyond the identification of the optimal solution, the global minimum-energy conformation, the CFN-based method is also able to quickly enumerate large ensembles of suboptimal solutions of interest to rationally build experimental enzyme mutant libraries. AVAILABILITY The combined pipeline used to generate energetic models (based on a patched version of the open source solver Osprey 2.0), the conversion to CFN models (based on Perl scripts) and CFN solving (based on the open source solver toulbar2) are all available at http://genoweb.toulouse.inra.fr/~tschiex/CPD

[1]  Stephen L. Mayo,et al.  Dramatic performance enhancements for the FASTER optimization algorithm , 2006, J. Comput. Chem..

[2]  Amy C. Anderson,et al.  Computational structure-based redesign of enzyme activity , 2009, Proceedings of the National Academy of Sciences.

[3]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[4]  Thomas Schiex,et al.  Solving weighted CSP by maintaining arc consistency , 2004, Artif. Intell..

[5]  D B Gordon,et al.  Branch-and-terminate: a combinatorial optimization algorithm for protein design. , 1999, Structure.

[6]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[7]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[8]  Martin C. Cooper,et al.  Soft arc consistency revisited , 2010, Artif. Intell..

[9]  F. Arnold Combinatorial and computational challenges for biocatalyst design , 2001, Nature.

[10]  Simon de Givry,et al.  Radio Link Frequency Assignment , 1999, Constraints.

[11]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[12]  Richard J. Wallace,et al.  Enhancements of Branch and Bound Methods for the Maximal Constraint Satisfaction Problem , 1996, AAAI/IAAI, Vol. 1.

[13]  Andrew M Wollacott,et al.  Prediction of amino acid sequence from structure , 2000, Protein science : a publication of the Protein Society.

[14]  Bruce Tidor,et al.  Computational design of antibody-affinity improvement beyond in vivo maturation , 2007, Nature Biotechnology.

[15]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[16]  Bruce Randall Donald,et al.  Protein Design Using Continuous Rotamers , 2012, PLoS Comput. Biol..

[17]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[18]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[19]  Mark A Hallen,et al.  Dead‐end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility , 2013, Proteins.

[20]  Thomas Schiex,et al.  Reversible DAC and Other Improvements for Solving Max-CSP , 1998, AAAI/IAAI.

[21]  Thomas Schiex,et al.  DARN! A Weighted Constraint Solver for RNA Motif Localization , 2007, Constraints.

[22]  Lakhdar Sais,et al.  Reasoning from last conflict(s) in constraint programming , 2009, Artif. Intell..

[23]  Thomas Simonson,et al.  A residue-pairwise generalized born scheme suitable for protein design calculations. , 2005, The journal of physical chemistry. B.

[24]  Ernst Althaus,et al.  A combinatorial approach to protein docking with flexible side-chains , 2000, RECOMB '00.

[25]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  F. Richards,et al.  Construction of new ligand binding sites in proteins of known structure. I. Computer-aided modeling of sites with pre-defined geometry. , 1991, Journal of molecular biology.

[27]  Thomas Schiex,et al.  MendelSoft: Mendelian error detection in complex pedigree using weighted constraint satisfaction techniques. , 2006 .

[28]  Tomás Lozano-Pérez,et al.  Rotamer optimization for protein design through MAP estimation and problem‐size reduction , 2009, J. Comput. Chem..

[29]  Christopher A. Voigt,et al.  Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. , 2000, Journal of molecular biology.

[30]  J R Desjarlais,et al.  De novo design of the hydrophobic cores of proteins , 1995, Protein science : a publication of the Protein Society.

[31]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[32]  Gregory D. Hawkins,et al.  Parametrized Models of Aqueous Free Energies of Solvation Based on Pairwise Descreening of Solute Atomic Charges from a Dielectric Medium , 1996 .

[33]  Thomas Schiex,et al.  Valued Constraint Satisfaction Problems: Hard and Easy Problems , 1995, IJCAI.

[34]  Jack Snoeyink,et al.  An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem , 2004, Pacific Symposium on Biocomputing.

[35]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[36]  C. Pabo Molecular technology: Designing proteins and peptides , 1983, Nature.

[37]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[38]  Bruce Randall Donald,et al.  Dead-End Elimination with Backbone Flexibility , 2007, ISMB/ECCB.

[39]  Elisabeth L. Humphris,et al.  Prediction of protein-protein interface sequence diversity using flexible backbone computational protein design. , 2008, Structure.

[40]  Pablo Gainza,et al.  Osprey: Protein Design with Ensembles, Flexibility, and Provable Algorithms , 2022 .

[41]  Thomas Scheibel,et al.  Mimicking biopolymers on a molecular scale: nano(bio)technology based on engineered proteins , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[42]  Martin C. Cooper,et al.  Arc consistency for soft constraints , 2004, Artif. Intell..

[43]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[44]  Ryo Takeuchi,et al.  Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis. , 2012, Nature chemical biology.

[45]  Simon de Givry,et al.  Existential arc consistency: Getting closer to full arc consistency in weighted CSPs , 2005, IJCAI.

[46]  Arie M. C. A. Koster,et al.  Solving frequency assignment problems via tree-decomposition , 1999 .

[47]  Bruce Randall Donald,et al.  Algorithm for backrub motions in protein design , 2008, ISMB.

[48]  R. Abagyan,et al.  Large‐scale prediction of protein geometry and stability changes for arbitrary single point mutations , 2004, Proteins.

[49]  Karl Nicholas Kirschner,et al.  GLYCAM06: A generalizable biomolecular force field. Carbohydrates , 2008, J. Comput. Chem..

[50]  Bernhard Hauer,et al.  Recent progress in industrial biocatalysis. , 2011, Current opinion in chemical biology.

[51]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[52]  S J Wodak,et al.  Automatic protein design with all atom force-fields by exact and heuristic optimization. , 2000, Journal of molecular biology.

[53]  P. Kollman,et al.  Automatic atom type and bond type perception in molecular mechanical calculations. , 2006, Journal of molecular graphics & modelling.

[54]  Simon de Givry,et al.  Computational Protein Design as a Cost Function Network Optimization Problem , 2012, CP.