Fitness landscape analysis around the optimum in computational protein design

The geometry and properties of the fitness landscapes of Computational Protein Design (CPD) are not well understood, due to the difficulty for sampling methods to access the NP-hard optima and explore their neighborhoods. In this paper, we enumerate all solutions within a 2 kcal/mol energy interval of the optimum of two CPD problems. We compute the number of local minima, the size of the attraction basins, and the local optima network. We provide various features in order to characterize the fitness landscapes, in particular the multimodality, and the ruggedness of the fitness landscape. Results show some key differences in the fitness landscapes and help to understand the successes and failures of metaheuristics on CPD problems. Our analysis gives some previously inaccessible and valuable information on the problem structure related to the optima of the CPD instances (multi-funnel structure), and could lead to the development of more efficient metaheuristic methods.

[1]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[2]  Ryo Takeuchi,et al.  Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis. , 2012, Nature chemical biology.

[3]  Andrew M Wollacott,et al.  Prediction of amino acid sequence from structure , 2000, Protein science : a publication of the Protein Society.

[4]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[5]  Thomas Schiex,et al.  Guaranteed Discrete Energy Optimization on Large Protein Design Problems. , 2015, Journal of chemical theory and computation.

[6]  P. Stadler Fitness Landscapes , 1993 .

[7]  Simon de Givry,et al.  A new framework for computational protein design through cost function network optimization , 2013, Bioinform..

[8]  Gabriela Ochoa,et al.  Visualising the Search Landscape of the Triangle Program , 2017, EuroGP.

[9]  Sébastien Vérel,et al.  Pareto Local Optima of Multiobjective NK-Landscapes with Correlated Objectives , 2011, EvoCOP.

[10]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  L. Altenberg The evolution of evolvability in genetic programming , 1994 .

[12]  Sébastien Vérel,et al.  A study of NK landscapes' basins and local optima networks , 2008, GECCO '08.

[13]  Thomas Schiex,et al.  Valued Constraint Satisfaction Problems: Hard and Easy Problems , 1995, IJCAI.

[14]  Benjamin D Allen,et al.  Combinatorial methods for small-molecule placement in computational enzyme design , 2006, Proceedings of the National Academy of Sciences.

[15]  Werner Ebeling,et al.  The Density of States - A Measure of the Difficulty of Optimisation Problems , 1996, PPSN.

[16]  Sébastien Vérel,et al.  Local Optima Networks: A New Model of Combinatorial Fitness Landscapes , 2014, ArXiv.

[17]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[18]  Sébastien Vérel,et al.  Where are bottlenecks in NK fitness landscapes? , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[19]  Martin C. Cooper,et al.  Soft arc consistency revisited , 2010, Artif. Intell..

[20]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[21]  Mona Singh,et al.  A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies , 2004, INFORMS J. Comput..

[22]  Hiroki Noguchi,et al.  Computational design of a self-assembling symmetrical β-propeller protein , 2014, Proceedings of the National Academy of Sciences.

[23]  Christopher A. Voigt,et al.  Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. , 2000, Journal of molecular biology.

[24]  Josselin Garnier,et al.  Efficiency of Local Search with Multiple Local Optima , 2001, SIAM J. Discret. Math..

[25]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[26]  Sophie Barbe,et al.  Computer-Aided Engineering of a Transglycosylase for the Glucosylation of an Unnatural Disaccharide of Relevance for Bacterial Antigen Synthesis , 2015 .

[27]  Tim Jones Evolutionary Algorithms, Fitness Landscapes and Search , 1995 .

[28]  Bruce Randall Donald,et al.  Fast search algorithms for computational protein design , 2016, J. Comput. Chem..

[29]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[30]  Simon de Givry,et al.  Computational protein design as an optimization problem , 2014, Artif. Intell..

[31]  E. Weinberger,et al.  Correlated and uncorrelated fitness landscapes and how to tell the difference , 1990, Biological Cybernetics.