Grid-based evolutionary strategies applied to the conformational sampling problem

Computational simulations of conformational sampling in general, and of macromolecular folding in particular represent one of the most important and yet one of the most challenging applications of computer science in biology and medicinal chemistry. The advent of GRID computing may trigger some major progress in this field. This paper presents our first attempts to design GRID-based conformational sampling strategies, exploring the extremely rugged energy response surface in function of molecular geometry, in search of low energy zones through phase spaces of hundreds of degrees of freedom. We have generalized the classical island model deployment of genetic algorithms (GA) to a "planetary" model where each node of the grid is assimilated to a "planet" harboring quasi-independent multi-island simulations based on a hybrid GA-driven sampling approach. Although different "planets" do not communicate to each other-thus minimizing inter-CPU exchanges on the GRID-each new simulation will benefit from the preliminary knowledge extracted from the centralized pool of already visited geometries, located on the dispatcher machine, and which is disseminated to any new "planet". This "panspermic" strategy allows new simulations to be conducted such as to either be attracted towards an apparently promising phase space zone (biasing strategies, intensification procedures) or to avoid already in-depth sampled (tabu) areas. Successful folding of mini-proteins typically used in benchmarks for all- atoms protein simulations has been observed, although the reproducibility of these highly stochastic simulations in huge problem spaces is still in need of improvement. Work on two structured peptides (the "tryptophane cage" 1L2Y and the "tryptophane zipper" 1LE1) used as benchmarks for all-atom protein folding simulations has shown that the planetary model is able to reproducibly sample conformers from the neighborhood of the native geometries. However, within these neighborhoods (within ensembles of conformers similar to models published on hand of experimental geometry determinations), the energy landscapes are still extremely rugged. Therefore, simulations in general produce "correct" geometries (similar enough to experimental model for any practical purposes) which sometimes unfortunately correspond to relatively high energy levels and therefore are less stable than the most stable among misfolded conformers. The method thus reproducibly visits the native phase space zone, but fails to reproducibly hit the bottom of its rugged energy well. Intensifications of local sampling may in principle solve this problematic behavior, but is limited by computational resources. The quest for the optimal time point at which a phase space zone should stop being intensively searched and declared tabu, a very difficult problem, is still awaiting for a practically useful solution.

[1]  Martin Gruebele,et al.  Engineering a beta-sheet protein toward the folding speed limit. , 2005, The journal of physical chemistry. B.

[2]  A. T. Hagler,et al.  ENERGY FUNCTIONS FOR PEPTIDES AND PROTEINS PART 2, THE AMIDE HYDROGEN BOND AND CALCULATION OF AMIDE CRYSTAL PROPERTIES , 1974 .

[3]  J. W. Neidigh,et al.  Designing a 20-residue protein , 2002, Nature Structural Biology.

[4]  N. Skelton,et al.  Tryptophan zippers: Stable, monomeric β-hairpins , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Fred W. Glover,et al.  Genetic algorithms and tabu search: Hybrids for optimization , 1995, Comput. Oper. Res..

[6]  David J Wales,et al.  Potential energy and free energy landscapes. , 2006, The journal of physical chemistry. B.

[7]  El-Ghazali Talbi,et al.  ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics , 2004, J. Heuristics.

[8]  S. Lifson,et al.  Energy functions for peptides and proteins. I. Derivation of a consistent force field including the hydrogen bond from amide crystals. , 1974, Journal of the American Chemical Society.

[9]  Martin Gruebele,et al.  Engineering a β-sheet protein toward the folding speed limit , 2005 .

[10]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[11]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[12]  C. Levinthal How to fold graciously , 1969 .

[13]  Benjamin Parent,et al.  Optimized Evolutionary Strategies in Conformational Sampling , 2006, Soft Comput..

[14]  Keith Vertanen,et al.  Genetic Adventures in Parallel : Towards a Good Island Model under PVM , 2004 .

[15]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[16]  J. Onuchic,et al.  Theory of Protein Folding This Review Comes from a Themed Issue on Folding and Binding Edited Basic Concepts Perfect Funnel Landscapes and Common Features of Folding Mechanisms , 2022 .

[17]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[18]  D Horvath,et al.  A virtual screening approach applied to the search for trypanothione reductase inhibitors. , 1997, Journal of medicinal chemistry.