A grid-based genetic algorithm combined with an adaptive simulated annealing for protein structure prediction

A hierarchical hybrid model of parallel metaheuristics is proposed, combining an evolutionary algorithm and an adaptive simulated annealing. The algorithms are executed inside a grid environment with different parallelization strategies: the synchronous multi-start model, parallel evaluation of different solutions and an insular model with asynchronous migrations. Furthermore, a conjugated gradient local search method is employed at different stages of the exploration process. The algorithms were evaluated using the protein structure prediction problem, having as benchmarks the tryptophan-cage protein (Brookhaven Protein Data Bank ID: 1L2Y), the tryptophan-zipper protein (PDB ID: 1LE1) and the α-Cyclodextrin complex. Experimentations were performed on a nation-wide grid infrastructure, over six distinct administrative domains and gathering nearly 1,000 CPUs. The complexity of the protein structure prediction problem remains prohibitive as far as large proteins are concerned, making the use of parallel computing on the computational grid essential for its efficient resolution.

[1]  F. Zerilli,et al.  Ab Initio Calculation of Intermolecular Potential Parameters for Gaseous Decomposition Products of Energetic Materials , 2000 .

[2]  Bruce E. Rosen,et al.  Genetic Algorithms and Very Fast Simulated Reannealing: A comparison , 1992 .

[3]  Alvaro L. Islas,et al.  Multi-symplectic methods for generalized Schrödinger equations , 2003, Future Gener. Comput. Syst..

[4]  Thom Vreven,et al.  Geometry optimization with QM/MM, ONIOM, and other combined methods. I. Microiterations and constraints , 2003, J. Comput. Chem..

[5]  David E. Clark,et al.  A comparison of heuristic search algorithms for molecular docking , 1997, J. Comput. Aided Mol. Des..

[6]  El-Ghazali Talbi,et al.  Solving the Protein Folding Problem with a Bicriterion Genetic Algorithm on the Grid , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[7]  J. Ponder,et al.  Force fields for protein simulations. , 2003, Advances in protein chemistry.

[8]  Hagit Attiya,et al.  Wiley Series on Parallel and Distributed Computing , 2004, SCADA Security: Machine Learning Concepts for Intrusion Detection and Prevention.

[9]  Lester Ingber,et al.  Adaptive Simulated Annealing (ASA) and Path-Integral (PATHINT) Algorithms: Generic Tools for Complex Systems , 2001 .

[10]  C. Levinthal How to fold graciously , 1969 .

[11]  M. Karplus,et al.  Dynamics of folded proteins , 1977, Nature.

[12]  H. Dorsett,et al.  Overview of Molecular Modelling and Ab initio Molecular Orbital Methods Suitable for Use with Energetic Materials , 2000 .

[13]  El-Ghazali Talbi,et al.  An enabling framework for parallel optimization on the computational grid , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[14]  Pierre-Yves Calland On the structural complexity of a protein. , 2003, Protein engineering.

[15]  Subhash Saini,et al.  Scalable atomistic simulation algorithms for materials research , 2001, SC.

[16]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[17]  Subhash Saini,et al.  Collaborative Simulation Grid: Multiscale Quantum-Mechanical/Classical Atomistic Simulations on Distributed PC Clusters in the US and Japan , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[18]  L. Ingber Adaptive Simulated Annealing (ASA) , 1993 .

[19]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[20]  Lester Ingber,et al.  Adaptive simulated annealing (ASA): Lessons learned , 2000, ArXiv.

[21]  J T Ngo,et al.  Computational complexity of a problem in molecular structure prediction. , 1992, Protein engineering.

[22]  Boolean Formulas,et al.  Computational Complexity of , 1999 .

[23]  El-Ghazali Talbi,et al.  ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics , 2004, J. Heuristics.

[24]  Arnold Neumaier,et al.  Molecular Modeling of Proteins and Mathematical Prediction of Protein Structure , 1997, SIAM Rev..

[25]  El-Ghazali Talbi,et al.  A parallel hybrid genetic algorithm for protein structure prediction on the computational grid , 2007, Future Gener. Comput. Syst..

[26]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[27]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[28]  B. Alder,et al.  Studies in Molecular Dynamics. I. General Method , 1959 .

[29]  Lester Ingber,et al.  Simulated annealing: Practice versus theory , 1993 .

[30]  El-Ghazali Talbi,et al.  A Taxonomy of Hybrid Metaheuristics , 2002, J. Heuristics.

[31]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[32]  Enrique Alba,et al.  Metaheuristics and Parallelism , 2005 .

[33]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[34]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[35]  Brian E. Moore,et al.  Multi-symplectic integration methods for Hamiltonian PDEs , 2003, Future Gener. Comput. Syst..

[36]  El-Ghazali Talbi,et al.  Solving the Protein Folding Problem with a Bicriterion Genetic Algorithm on the Grid , 2006 .

[37]  El-Ghazali Talbi,et al.  Grid computing for parallel bioinspired algorithms , 2006, J. Parallel Distributed Comput..

[38]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[39]  G. Grassy,et al.  Glossary of terms used in computational drug design (IUPAC Recommendations 1997) , 1997 .

[40]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[41]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[42]  Avneesh Pant,et al.  Communicating efficiently on cluster based grids with MPICH-VMI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[43]  René Thomsen,et al.  Flexible ligand docking using evolutionary algorithms: investigating the effects of variation operators and local search hybrids. , 2003, Bio Systems.

[44]  B. Alder,et al.  Phase Transition for a Hard Sphere System , 1957 .