Computational Protein Design as a Cost Function Network Optimization Problem

Proteins are chains of simple molecules called amino acids. The three-dimensional shape of a protein and its amino acid composition define its biological function. Over millions of years, living organisms have evolved and produced a large catalog of proteins. By exploring the space of possible amino-acid sequences, protein engineering aims at similarly designing tailored proteins with specific desirable properties. In Computational Protein Design (CPD), the challenge of identifying a protein that performs a given task is defined as the combinatorial optimization problem of a complex energy function over amino acid sequences. In this paper, we introduce the CPD problem and some of the main approaches that have been used to solve it. We then show how this problem directly reduces to Cost Function Network (CFN) and 0/1LP optimization problems. We construct different real CPD instances to evaluate CFN and 0/1LP algorithms as implemented in the toulbar2 and cplex solvers. We observe that CFN algorithms bring important speedups compared to the CPD platform osprey but also to cplex.

[1]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[2]  Ryo Takeuchi,et al.  Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis. , 2012, Nature chemical biology.

[3]  Dan S. Tawfik,et al.  Protein engineers turned evolutionists , 2007, Nature Methods.

[4]  Richard J. Wallace,et al.  Directed Arc Consistency Preprocessing , 1995, Constraint Processing, Selected Papers.

[5]  Andrew M Wollacott,et al.  Prediction of amino acid sequence from structure , 2000, Protein science : a publication of the Protein Society.

[6]  Thomas Schiex,et al.  On the Complexity of Compact Coalitional Games , 2009, IJCAI.

[7]  S. L. Mayo,et al.  Protein design automation , 1996, Protein science : a publication of the Protein Society.

[8]  A. R. Fresht Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding , 1999 .

[9]  Martin C. Cooper,et al.  Soft arc consistency revisited , 2010, Artif. Intell..

[10]  Stefano Bistarelli,et al.  Interchangeability in Soft CSPs , 2002, CP.

[11]  Thomas Schiex,et al.  Maintaining Reversible DAC for Max-CSP , 1999, Artif. Intell..

[12]  Simon de Givry,et al.  Existential arc consistency: Getting closer to full arc consistency in weighted CSPs , 2005, IJCAI.

[13]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[14]  Martin C. Cooper,et al.  Optimal Soft Arc Consistency , 2007, IJCAI.

[15]  Thomas Schiex,et al.  Reversible DAC and Other Improvements for Solving Max-CSP , 1998, AAAI/IAAI.

[16]  Rina Dechter,et al.  Principles and Practice of Constraint Programming – CP 2000 , 2001, Lecture Notes in Computer Science.

[17]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[18]  Thomas Scheibel,et al.  Mimicking biopolymers on a molecular scale: nano(bio)technology based on engineered proteins , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[19]  J. Pleiss Protein design in metabolic engineering and synthetic biology. , 2011, Current opinion in biotechnology.

[20]  Arie M. C. A. Koster,et al.  Solving frequency assignment problems via tree-decomposition , 1999 .

[21]  Arie M. C. A. Koster,et al.  Optimal Solutions for Frequency Assignment Problems via Tree Decomposition , 1999, WG.

[22]  Bruce Randall Donald,et al.  Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design , 2006, ISMB.

[23]  L L Looger,et al.  Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. , 2001, Journal of molecular biology.

[24]  K. Sharp,et al.  Potential energy functions for protein design. , 2007, Current opinion in structural biology.

[25]  Stephen L. Mayo,et al.  Conformational splitting: A more powerful criterion for dead-end elimination , 2000, J. Comput. Chem..

[26]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[27]  Toby Walsh,et al.  Principles and Practice of Constraint Programming — CP 2001: 7th International Conference, CP 2001 Paphos, Cyprus, November 26 – December 1, 2001 Proceedings , 2001, Lecture Notes in Computer Science.

[28]  Martin C. Cooper Fundamental Properties of Neighbourhood Substitution in Constraint Satisfaction Problems , 1997, Artif. Intell..

[29]  Roberto Rossi,et al.  Cost-Based Filtering for Stochastic Inventory Control , 2006, CSCLP.

[30]  C. Pabo Molecular technology: Designing proteins and peptides , 1983, Nature.

[31]  Christopher A. Voigt,et al.  Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. , 2000, Journal of molecular biology.

[32]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[33]  Matthew L. Ginsberg,et al.  Limited Discrepancy Search , 1995, IJCAI.

[34]  Bernhard Hauer,et al.  Recent progress in industrial biocatalysis. , 2011, Current opinion in chemical biology.

[35]  Martin T. Swain,et al.  A CLP Approach to the Protein Side-Chain Placement Problem , 2001, CP.

[36]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[37]  Gregory D. Hawkins,et al.  Parametrized Models of Aqueous Free Energies of Solvation Based on Pairwise Descreening of Solute Atomic Charges from a Dielectric Medium , 1996 .

[38]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[39]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[40]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[41]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..