DFS-generated pathways in GA crossover for protein structure prediction

Genetic algorithms (GAs), as nondeterministic conformational search techniques, are promising for solving protein structure prediction (PSP) problems. The crossover operator of a GA can underpin the formation of potential conformations by exchanging and sharing potential sub-conformations. However, as the optimum PSP conformation is usually compact, the crossover operation may result in many invalid conformations (by having non-self-avoiding walks). Although a crossover-based converging conformation suffers from limited pathways, combining it with depth-first search (DFS) can partially reveal potential pathways and make an invalid crossover valid and successful. Random conformations are frequently applied for maintaining diversity as well as for initialization in many GA applications. The random-move-only-based conformation generator has exponential time complexity in generating random conformations, whereas the DFS-based random conformation generator has linear time complexity and performs relatively faster. We have performed extensive experiments using popular 2D, as well as useful 3D, models to justify our hypothesis empirically.

[1]  S. Toma,et al.  Contact interactions method: A new algorithm for protein folding simulations , 1996, Protein science : a publication of the Protein Society.

[2]  Abdul Sattar,et al.  Protein folding prediction in 3D FCC HP lattice model using genetic algorithm , 2007, 2007 IEEE Congress on Evolutionary Computation.

[3]  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997 .

[4]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[5]  H. Scheraga,et al.  A comparison of the CHARMM, AMBER and ECEPP potentials for peptides. II. Phi-psi maps for N-acetyl alanine N'-methyl amide: comparisons, contrasts and simple experimental tests. , 1989, Journal of biomolecular structure & dynamics.

[6]  Hisao Ishibuchi,et al.  Hybrid Evolutionary Algorithms , 2007 .

[7]  Konstantinos G. Margaritis,et al.  An Experimental Study of Benchmarking Functions for Genetic Algorithms , 2002, Int. J. Comput. Math..

[8]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[9]  Vincenzo Cutello,et al.  An Immune Algorithm for Protein Structure Prediction on Lattice Models , 2007, IEEE Transactions on Evolutionary Computation.

[10]  Abdul Sattar,et al.  Extended HP Model for Protein Structure Prediction , 2009, J. Comput. Biol..

[11]  Mihalis Yannakakis,et al.  On the complexity of protein folding (abstract) , 1998, RECOMB '98.

[12]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[13]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[14]  Ron Unger,et al.  On the applicability of genetic algorithms to protein folding , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[15]  Tamjidul Hoque,et al.  Fast computation of the fitness function for protein folding prediction in a 2D hydrophilic-hydrophobic model , 2005 .

[16]  Mihalis Yannakakis,et al.  On the complexity of protein folding (extended abstract) , 1998, STOC '98.

[17]  David Baker,et al.  Ab initio methods. , 2003, Methods of biochemical analysis.

[18]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules J. Am. Chem. Soc. 1995, 117, 5179−5197 , 1996 .

[19]  L Toma,et al.  Folding simulation of protein models on the structure‐based cubo‐octahedral lattice with the Contact Interactions algorithm , 1999, Protein science : a publication of the Protein Society.

[20]  Yong Duan,et al.  Computational protein folding: From lattice to all-atom , 2001, IBM Syst. J..

[21]  Ajay K. Royyuru,et al.  Blue Gene: A vision for protein science using a petaflop supercomputer , 2001, IBM Syst. J..

[22]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[23]  C. Levinthal Are there pathways for protein folding , 1968 .

[24]  Andrew Lewis,et al.  Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low-Resolution Model , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Ulrich H. E. Hansmann,et al.  Protein folding in silico: an overview , 2003, Comput. Sci. Eng..

[26]  Bikas K. Chakrabarti,et al.  Statistics of linear polymers in disordered media , 2005 .

[27]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[28]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[29]  Tom Creighton Importance of Protein Folding , 2008 .

[30]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[31]  D. Baker,et al.  Prediction and design of macromolecular structures and interactions , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[32]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[33]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[34]  Andrew Lewis,et al.  DFS Based Partial Pathways in GA for Protein Structure Prediction , 2008, PRIB.

[35]  V S Pande,et al.  Folding pathway of a lattice model for proteins. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Gerard T. Barkema,et al.  Exploring high-dimensional energy landscapes , 1999, Comput. Sci. Eng..

[37]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[38]  S Banu Ozkan,et al.  The protein folding problem: when will it be solved? , 2007, Current opinion in structural biology.

[39]  Michael Bachmann,et al.  Exact enumeration of three-dimensional lattice proteins , 2005, Comput. Phys. Commun..

[40]  K. Lin,et al.  Universal amplitude ratios for three-dimensional self-avoiding walks , 2002 .

[41]  Julian Lee Conformational space annealing and a lattice model protein , 2004 .

[42]  Madhu Chetty,et al.  Generalized Schemata Theorem Incorporating Twin Removal for Protein Structure Prediction , 2007, PRIB.

[43]  Holger H. Hoos,et al.  An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem , 2005, BMC Bioinformatics.

[44]  V. K. Koumousis,et al.  A saw-tooth genetic algorithm combining the effects of variable population size and reinitialization to enhance performance , 2006, IEEE Transactions on Evolutionary Computation.

[45]  Daniel J. Rigden,et al.  From Protein Structure to Function with Bioinformatics , 2009 .

[46]  David Corne,et al.  An Introduction to Bioinformatics for Computer Scientists , 2003 .

[47]  Erich Bornberg-Bauer,et al.  Chain growth algorithms for HP-type lattice proteins , 1997, RECOMB '97.

[48]  Songde Ma,et al.  Protein folding simulations of the hydrophobic–hydrophilic model by combining tabu search with genetic algorithms , 2003 .

[49]  Yue,et al.  Sequence-structure relationships in proteins and copolymers. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[50]  Rolf Backofen,et al.  Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets , 1999, Bioinform..

[51]  Madhu Chetty,et al.  A new guided genetic algorithm for 2D hydrophobic-hydrophilic model to predict protein folding , 2005, 2005 IEEE Congress on Evolutionary Computation.

[52]  JEFFREY SKOLNICK P Computational Studies of Protein Folding B I O E N G I N E E R I N G a N D B I O P H Y S I C S Feasibility of Structural Refinement , 2001 .

[53]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[54]  K. Dill,et al.  The ultimate speed limit to protein folding is conformational searching. , 2007, Journal of the American Chemical Society.

[55]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[56]  Madhu Chetty,et al.  A Hybrid Genetic Algorithm for 2D FCC Hydrophobic-Hydrophilic Lattice Model to Predict Protein Folding , 2006, Australian Conference on Artificial Intelligence.

[57]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[58]  T. Hales The Kepler conjecture , 1998, math/9811078.

[59]  Keri Schreiner News: Distributed Projects Tackle Protein Mystery , 2001, IEEE Distributed Syst. Online.

[60]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[61]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[62]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[63]  T. Head-Gordon,et al.  Minimalist models for protein folding and design. , 2003, Current opinion in structural biology.

[64]  Erich Bornberg-Bauer,et al.  Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Tamjidul Hoque,et al.  Significance of Hybrid Evolutionary Computation for Ab Initio Protein Folding Prediction , 2007 .

[66]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[67]  Walter Nadler,et al.  Testing a new Monte Carlo algorithm for protein folding , 1998 .

[68]  O. Takahashi,et al.  Protein Folding by a Hierarchical Genetic Algorithm , 1999 .

[69]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[70]  Sitao Wu,et al.  Ab Initio Protein Structure Prediction , 2009 .

[71]  Rolf Backofen,et al.  Algorithmic approach to quantifying the hydrophobic force contribution in protein folding , 1999, German Conference on Bioinformatics.

[72]  Karin M. Verspoor,et al.  Protein annotation as term categorization in the gene ontology using word proximity networks , 2005, BMC Bioinformatics.

[73]  Jooyoung Lee,et al.  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997, J. Comput. Chem..

[74]  Anthony J. Guttmann,et al.  Self-avoiding walks in constrained and random geometries: Series studies , 2005 .

[75]  Thomas Dandekar,et al.  Refined Genetic Algorithm Simulations to Model Proteins , 1999 .

[76]  Madhu Chetty,et al.  A Guided Genetic Algorithm for Protein Folding Prediction Using 3D Hydrophobic-Hydrophilic Model , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[77]  Rolf Backofen,et al.  A Constraint-Based Approach to Fast and Exact Structure Prediction in Three-Dimensional Protein Models , 2006, Constraints.

[78]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[79]  Jim Smith,et al.  Study of fitness landscapes for the HP model of protein structure prediction , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[80]  Madhu Chetty,et al.  Non-Isomorphic Coding in Lattice Model and its Impact for Protein Folding Prediction Using Genetic Algorithm , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[81]  Ram Samudrala,et al.  A Combined Approach for Ab Initio Construction of Low Resolution Protein Tertiary Structures from Sequence , 1999, Pacific Symposium on Biocomputing.

[82]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[83]  Erich Bornberg-Bauer,et al.  Comparing folding codes in simple heteropolymer models of protein evolutionary landscape: robustness of the superfunnel paradigm. , 2005, Biophysical journal.

[84]  Kei Yura,et al.  [Structural bioinformatics]. , 2009, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.