Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low-Resolution Model

This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.

[1]  S. Toma,et al.  Contact interactions method: A new algorithm for protein folding simulations , 1996, Protein science : a publication of the Protein Society.

[2]  Abdul Sattar,et al.  Protein folding prediction in 3D FCC HP lattice model using genetic algorithm , 2007, 2007 IEEE Congress on Evolutionary Computation.

[3]  Madhu Chetty,et al.  Generalized Schemata Theorem Incorporating Twin Removal for Protein Structure Prediction , 2007, PRIB.

[4]  Yong Wang,et al.  Exploration of two-dimensional hydrophobic-polar lattice model by combining local search with elastic net algorithm. , 2006, The Journal of chemical physics.

[5]  David Corne,et al.  An Introduction to Bioinformatics for Computer Scientists , 2003 .

[6]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[7]  Abdul Sattar,et al.  Extended HP Model for Protein Structure Prediction , 2009, J. Comput. Biol..

[8]  Ron Unger,et al.  On the applicability of genetic algorithms to protein folding , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[9]  Holger H. Hoos,et al.  An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem , 2005, BMC Bioinformatics.

[10]  Ram Samudrala,et al.  A Combined Approach for Ab Initio Construction of Low Resolution Protein Tertiary Structures from Sequence , 1999, Pacific Symposium on Biocomputing.

[11]  L. Darrell Whitley,et al.  An overview of evolutionary algorithms: practical issues and common pitfalls , 2001, Inf. Softw. Technol..

[12]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[13]  Larry J. Eshelman,et al.  Preventing Premature Convergence in Genetic Algorithms by Preventing Incest , 1991, ICGA.

[14]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[15]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[16]  Chang Wook Ahn,et al.  On the practical genetic algorithms , 2005, GECCO '05.

[17]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[18]  Lee Altenberg,et al.  The Schema Theorem and Price's Theorem , 1994, FOGA.

[19]  William E. Hart,et al.  Fast Protein Folding in the Hydrophobic-Hydrophillic Model within Three-Eights of Optimal , 1996, J. Comput. Biol..

[20]  A. Dovier,et al.  Enhancing the computation of approximate solutions of the protein structure determination problem through global constraints for discrete crystal lattices , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[21]  Michael Bachmann,et al.  Exact enumeration of three-dimensional lattice proteins , 2005, Comput. Phys. Commun..

[22]  K. Lin,et al.  Universal amplitude ratios for three-dimensional self-avoiding walks , 2002 .

[23]  Madhu Chetty,et al.  Efficient Computation of Fitness Function by Pruning in Hydrophobic-Hydrophilic Model , 2005, ISBMDA.

[24]  Songde Ma,et al.  Protein folding simulations of the hydrophobic–hydrophilic model by combining tabu search with genetic algorithms , 2003 .

[25]  Yue,et al.  Sequence-structure relationships in proteins and copolymers. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[26]  Madhu Chetty,et al.  A new guided genetic algorithm for 2D hydrophobic-hydrophilic model to predict protein folding , 2005, 2005 IEEE Congress on Evolutionary Computation.

[27]  Mihalis Yannakakis,et al.  On the complexity of protein folding (extended abstract) , 1998, STOC '98.

[28]  David B. Fogel,et al.  Evolutionary Computation: Towards a New Philosophy of Machine Intelligence , 1995 .

[29]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[30]  Keri Schreiner Distributed projects tackle protein mystery , 2001, Comput. Sci. Eng..

[31]  Erich Bornberg-Bauer,et al.  Chain growth algorithms for HP-type lattice proteins , 1997, RECOMB '97.

[32]  Z. Luthey-Schulten,et al.  Ab initio protein structure prediction. , 2002, Current opinion in structural biology.

[33]  Bikas K. Chakrabarti,et al.  Statistics of linear polymers in disordered media , 2005 .

[34]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[35]  Christian N. S. Pedersen,et al.  Protein Folding in the 2D HP Model , 1999 .

[36]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[37]  Martin Vingron,et al.  Support Vector Machines for Protein Fold Class Prediction , 2003 .

[38]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[39]  Vincenzo Cutello,et al.  An Immune Algorithm for Protein Structure Prediction on Lattice Models , 2007, IEEE Transactions on Evolutionary Computation.

[40]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (2nd, extended ed.) , 1994 .

[41]  David Baker,et al.  Ab initio methods. , 2003, Methods of biochemical analysis.

[42]  Jim Smith,et al.  Study of fitness landscapes for the HP model of protein structure prediction , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[43]  KalyanmoyDebandSamirAgrawal KanpurGeneticAlgorithmsLaboratory,et al.  A Niched-Penalty Approach for Constraint Handling in Genetic Algorithms , 2002 .

[44]  Gerard T. Barkema,et al.  Exploring high-dimensional energy landscapes , 1999, Comput. Sci. Eng..

[45]  Madhu Chetty,et al.  Non-Isomorphic Coding in Lattice Model and its Impact for Protein Folding Prediction Using Genetic Algorithm , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[46]  Carlos A. Coello Coello,et al.  An updated survey of GA-based multiobjective optimization techniques , 2000, CSUR.

[47]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[48]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[49]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[50]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[51]  K. Dill,et al.  Protein folding in the landscape perspective: Chevron plots and non‐arrhenius kinetics , 1998, Proteins.

[52]  Jooyoung Lee,et al.  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997, J. Comput. Chem..

[53]  P. Grassberger,et al.  Testing a new Monte Carlo algorithm for protein folding , 1997, Proteins.

[54]  Anthony J. Guttmann,et al.  Self-avoiding walks in constrained and random geometries: Series studies , 2005 .

[55]  Zoubin Ghahramani,et al.  A Bayesian network model for protein fold and remote homologue recognition , 2002, Bioinform..

[56]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[57]  Konstantinos G. Margaritis,et al.  An Experimental Study of Benchmarking Functions for Genetic Algorithms , 2002, Int. J. Comput. Math..

[58]  J. Ben Rosen,et al.  Protein Structure and Energy Landscape Dependence on Sequence Using a Continuous Energy Function , 1997, J. Comput. Biol..

[59]  Holger H. Hoos,et al.  A replica exchange Monte Carlo algorithm for protein folding in the HP model , 2007, BMC Bioinformatics.

[60]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[61]  K. Dill,et al.  Protein core assembly processes , 1993 .

[62]  Kalyanmoy Deb,et al.  An Investigation of Niche and Species Formation in Genetic Function Optimization , 1989, ICGA.

[63]  Rolf Backofen,et al.  CPSP-web-tools: a server for 3D lattice protein studies , 2009, Bioinform..

[64]  J Meller,et al.  Linear programming optimization and a double statistical filter for protein threading protocols , 2001, Proteins.

[65]  Peter Clote,et al.  LocalMove: computing on-lattice fits for biopolymers , 2008, Nucleic Acids Res..

[66]  Alantha Newman A new algorithm for protein folding in the HP model , 2002, SODA '02.

[67]  Andrzej Kolinski,et al.  Computational studies of protein folding , 2001, Comput. Sci. Eng..

[68]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[69]  Erich Bornberg-Bauer,et al.  Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[70]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[71]  O. Takahashi,et al.  Protein Folding by a Hierarchical Genetic Algorithm , 1999 .

[72]  Joseph Klafter,et al.  Self-avoiding walks on a simple cubic lattice , 1993 .

[73]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[74]  Thomas Dandekar,et al.  Refined Genetic Algorithm Simulations to Model Proteins , 1999 .

[75]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[76]  Rolf Backofen,et al.  A Constraint-Based Approach to Fast and Exact Structure Prediction in Three-Dimensional Protein Models , 2006, Constraints.

[77]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[78]  Michael J. Panik Linear programming - mathematics, theory and algorithms , 1996, Applied optimization.

[79]  Ulrich H. E. Hansmann,et al.  Protein folding in silico: an overview , 2003, Comput. Sci. Eng..

[80]  S. Ronald Duplicate genotypes in a genetic algorithm , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[81]  D. Baker,et al.  Prediction and design of macromolecular structures and interactions , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[82]  David B. Fogel,et al.  Schema Processing, Proportional Selection, and the Misallocation of Trials in Genetic Algorithms , 2000, Inf. Sci..

[83]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[84]  Lee Conformational Space Annealing and a Lattice Model Protein , 2004 .

[85]  William M. Spears,et al.  Simple Subpopulation Schemes , 1998 .

[86]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[87]  Holger H. Hoos,et al.  An Ant Colony Optimization Algorithm for the 2D HP Protein Folding Problem , 2002, Ant Algorithms.

[88]  William E. Hart,et al.  Fast protein folding in the hydrophobic-hydrophilic model within three-eights of optimal , 1995, STOC '95.

[89]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[90]  David B. Fogel,et al.  Toward a New Philosophy of Machine Intelligence , 2000 .

[91]  GusfieldDan Introduction to the IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004 .

[92]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[93]  J Moult,et al.  Ab initio protein folding simulations with genetic algorithms: Simulations on the complete sequence of small proteins , 1997, Proteins.

[94]  Kimmo Kaski,et al.  Protein Structure Prediction System Based on Artificial Neural Networks , 1993, ISMB.