An efficient encoding for simplified protein structure prediction using genetic algorithms

Protein structure prediction is one of the most challenging problems in computational biology and remains unsolved for many decades. In a simplified version of the problem, the task is to find a self-avoiding walk with the minimum free energy assuming a discrete lattice and a given energy matrix. Genetic algorithms currently produce the state-of-the-art results for simplified protein structure prediction. However, performance of the genetic algorithms largely depends on the encodings they use in representing protein structures and the twin removal technique they use in eliminating duplicate solutions from the current population. In this paper, we present a new efficient encoding for protein structures. Our encoding is nonisomorphic in nature and results into efficient twin removal. This helps the search algorithm diversify and explore a larger area of the search space. In addition to this, we also propose an approximate matching scheme for removing near-similar solutions from the population. Our encoding algorithm is generic and applicable to any lattice type. On the standard benchmark proteins, our techniques significantly improve the state-of-the-art genetic algorithm for hydrophobic-polar (HP) energy model on face-centered-cubic (FCC) lattice.

[1]  Abdul Sattar,et al.  Memory-based local search for simplified protein structure prediction , 2012, BCB.

[2]  John H. Holland,et al.  Outline for a Logical Theory of Adaptive Systems , 1962, JACM.

[3]  E I Shakhnovich,et al.  A test of lattice protein folding algorithms. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[4]  William E. Hart,et al.  Lattice and Off-Lattice Side Chain Models of Protein Folding: Linear Time Structure Prediction Better than 86% of Optimal , 1997, J. Comput. Biol..

[5]  Ron Unger The Genetic Algorithm Approach to Protein Structure Prediction , 2004 .

[6]  Peter V. Coveney,et al.  Protein Structure Prediction as a Hard Optimization Problem: The Genetic Algorithm Approach , 1997, physics/9708012.

[7]  Scott E. Decatur Protein Folding in the Generalized Hydrophobic-Polar Model on the Triangular Lattice , 1996 .

[8]  Joe Marks,et al.  Human-guided tabu search , 2002, AAAI/IAAI.

[9]  Holger H. Hoos,et al.  An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem , 2005, BMC Bioinformatics.

[10]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[11]  Pascal Van Hentenryck,et al.  Protein Structure Prediction with Large Neighborhood Constraint Programming Search , 2008, CP.

[12]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[13]  Alessandro Dal Palù,et al.  A constraint solver for discrete lattices, its parallelization, and application to protein structure prediction , 2007 .

[14]  Rolf Backofen,et al.  CPSP-tools – Exact and complete algorithms for high-throughput 3D lattice protein studies , 2007, BMC Bioinformatics.

[15]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[16]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[17]  K. Dill,et al.  Protein core assembly processes , 1993 .

[18]  K Yue,et al.  Forces of tertiary structural organization in globular proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Kathleen Steinhöfel,et al.  Logarithmic Simulated Annealing for Protein Folding , 2007 .

[20]  Alessandro Dal Palù,et al.  A constraint solver for discrete lattices, its parallelization, and application to protein structure prediction , 2007, Softw. Pract. Exp..

[21]  Pascal Van Hentenryck,et al.  On Lattice Protein Structure Prediction Revisited , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Madhu Chetty,et al.  Non-Isomorphic Coding in Lattice Model and its Impact for Protein Folding Prediction Using Genetic Algorithm , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[23]  Abdul Sattar,et al.  A New Genetic Algorithm for Simplified Protein Structure Prediction , 2012, Australasian Conference on Artificial Intelligence.

[24]  Songde Ma,et al.  Protein folding simulations of the hydrophobic–hydrophilic model by combining tabu search with genetic algorithms , 2003 .

[25]  Y. Okamoto,et al.  A prediction of tertiary structures of peptide by the Monte Carlo simulated annealing method. , 1989, Protein engineering.

[26]  Barry Cipra,et al.  Packing Challenge Mastered At Last , 1998, Science.

[27]  Pascal Van Hentenryck,et al.  Protein Structure Prediction on the Face Centered Cubic Lattice by Local Search , 2008, AAAI.

[28]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[29]  Jeffrey Skolnick,et al.  Fast procedure for reconstruction of full‐atom protein models from reduced representations , 2008, J. Comput. Chem..

[30]  William E. Hart,et al.  Lattice and off-lattice side chain models of protein folding (extended abstract): linear time structure prediction better than 86% of optimal , 1997, RECOMB '97.

[31]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[32]  Ken Dill,et al.  A tabu search strategy for finding low energy structures of proteins in HP - model , 2004 .

[33]  Rolf Backofen,et al.  Algorithmic approach to quantifying the hydrophobic force contribution in protein folding , 1999, German Conference on Bioinformatics.

[34]  Genke Yang,et al.  Extremal Optimization for protein folding simulations on the lattice , 2009, Comput. Math. Appl..

[35]  Madhu Chetty,et al.  A Guided Genetic Algorithm for Protein Folding Prediction Using 3D Hydrophobic-Hydrophilic Model , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[36]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[37]  Ron Unger,et al.  On the applicability of genetic algorithms to protein folding , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[38]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[39]  Abdul Sattar,et al.  The road not taken: retreat and diverge in local search for simplified protein structure prediction , 2013, BMC Bioinformatics.

[40]  Sorin Istrail,et al.  Proceedings of the second annual international conference on Computational molecular biology , 1998, RECOMB 1998.

[41]  Kathleen Steinhöfel,et al.  A hybrid approach to protein folding problem integrating constraint programming with local search , 2010, BMC Bioinformatics.

[42]  T. Dandekar,et al.  Improving genetic algorithms for protein folding simulations by systematic crossover. , 1999, Bio Systems.

[43]  Erich Bornberg-Bauer,et al.  Chain growth algorithms for HP-type lattice proteins , 1997, RECOMB '97.

[44]  Sebastian Will Exact, constraint-based structure prediction in simple protein models , 2005 .

[45]  Andrew Lewis,et al.  Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low-Resolution Model , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.