An Enhanced Genetic Algorithm for Ab Initio Protein Structure Prediction

In-vitro methods for protein structure determination are time-consuming, cost-intensive, and failure-prone. Because of these expenses, alternative computer-based predictive methods have emerged. Predicting a protein's 3-D structure from only its amino acid sequence-also known as ab initio protein structure prediction (PSP)-is computationally demanding because the search space is astronomically large and energy models are extremely complex. Some successes have been achieved in predictive methods but these are limited to small sized proteins (around 100 amino acids); thus, developing efficient algorithms, reducing the search space, and designing effective search guidance heuristics are necessary to study large sized proteins. An on-lattice model can be a better ground for rapidly developing and measuring the performance of a new algorithm, and hence we consider this model for larger proteins (>150 amino acids) to enhance the genetic algorithms (GAs) framework. In this paper, we formulate PSP as a combinatorial optimization problem that uses 3-D face-centered-cubic lattice coordinates to reduce the search space and hydrophobic-polar energy model to guide the search. The whole optimization process is controlled by an enhanced GA framework with four enhanced features: 1) an exhaustive generation approach to diversify the search; 2) a novel hydrophobic core-directed macro-mutation operator to intensify the search; 3) a per-generation duplication elimination strategy to prevent early convergence; and 4) a random-walk technique to recover from stagnation. On a set of standard benchmark proteins, our algorithm significantly outperforms state-of-the-art algorithms. We also experimentally show that our algorithm is robust enough to produce very similar results regardless of different parameter settings.

[1]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[2]  Holger H. Hoos,et al.  A replica exchange Monte Carlo algorithm for protein folding in the HP model , 2007, BMC Bioinformatics.

[3]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[4]  Evripidis Bampis,et al.  Handbook of Approximation Algorithms and Metaheuristics , 2007 .

[5]  El-Ghazali Talbi,et al.  A grid-based genetic algorithm combined with an adaptive simulated annealing for protein structure prediction , 2008, Soft Comput..

[6]  Abdul Sattar,et al.  Memory-based local search for simplified protein structure prediction , 2012, BCB.

[7]  Osmar Norberto de Souza,et al.  Protein Structure, Modelling and Applications , 2007 .

[8]  Hauke Lilie Designer proteins in biotechnology , 2003, EMBO reports.

[9]  Yi Lu,et al.  Protein Structure Design and Engineering , 2011 .

[10]  Abdul Sattar,et al.  Protein folding prediction in 3D FCC HP lattice model using genetic algorithm , 2007, 2007 IEEE Congress on Evolutionary Computation.

[11]  C. Dobson,et al.  Protein misfolding, functional amyloid, and human disease. , 2006, Annual review of biochemistry.

[12]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[13]  Thomas Lengauer,et al.  Structure Based Drug Design , 2005 .

[14]  Tamjidul Hoque,et al.  Applying Feature-Based Resampling to Protein Structure Prediction , 2012, BICoB 2012.

[15]  Madhu Chetty,et al.  Clustered Memetic Algorithm With Local Heuristics for Ab Initio Protein Structure Prediction , 2013, IEEE Transactions on Evolutionary Computation.

[16]  Amarda Shehu,et al.  Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab Initio Protein Structure Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  K. Battaile,et al.  Ab Initio Structural Modeling of and Experimental Validation for Chlamydia trachomatis Protein CT296 Reveal Structural Similarity to Fe(II) 2-Oxoglutarate-Dependent Enzymes , 2011, Journal of bacteriology.

[18]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[19]  Joe Marks,et al.  Human-guided tabu search , 2002, AAAI/IAAI.

[20]  D. Hilvert,et al.  3D structural information as a guide to protein engineering using genetic selection. , 1997, Current opinion in structural biology.

[21]  Nashat Mansour,et al.  Protein structure prediction in the 3D HP model , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.

[22]  Jules R. Dégila,et al.  Topological design optimization of a yottabit-per-second lattice network , 2004, IEEE Journal on Selected Areas in Communications.

[23]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[24]  Abdul Sattar,et al.  A New Genetic Algorithm for Simplified Protein Structure Prediction , 2012, Australasian Conference on Artificial Intelligence.

[25]  Alessandro Dal Palù,et al.  Exploring Protein Fragment Assembly Using CLP , 2011, IJCAI.

[26]  Kenneth Steiglitz,et al.  Performance of VLSI Engines for Lattice Computations , 1987, Complex Syst..

[27]  Mathias Jucker,et al.  Self-propagation of pathogenic protein aggregates in neurodegenerative diseases , 2013, Nature.

[28]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[29]  Vincenzo Cutello,et al.  An Immune Algorithm for Protein Structure Prediction on Lattice Models , 2007, IEEE Transactions on Evolutionary Computation.

[30]  Charles Seife,et al.  What Is the Universe Made Of? , 2005, Science.

[31]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[32]  David E. Kim,et al.  Sampling bottlenecks in de novo protein structure prediction. , 2009, Journal of molecular biology.

[33]  Ivan Kondov,et al.  Protein structure prediction using particle swarm optimization and a distributed parallel approach , 2011, BADS '11.

[34]  Hoque Tamjidul Genetic algorithm for Ab initio protein structure prediction based on low resolution models , 2017 .

[35]  J. Skolnick,et al.  Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm , 2004, Proteins.

[36]  Adam Smith Protein misfolding , 2003, Nature.

[37]  T. Hales The Kepler conjecture , 1998, math/9811078.

[38]  A. Yonath,et al.  X-ray crystallography at the heart of life science. , 2011, Current opinion in structural biology.

[39]  Kathleen Steinhöfel,et al.  A hybrid approach to protein folding problem integrating constraint programming with local search , 2010, BMC Bioinformatics.

[40]  Mitsuo Gen,et al.  Genetic algorithms and engineering design , 1997 .

[41]  So Much More to Know … , 2005, Science.

[42]  A. H. Stouthamer A theoretical study on the amount of ATP required for synthesis of microbial cell material , 2007, Antonie van Leeuwenhoek.

[43]  Pascal Van Hentenryck,et al.  Protein Structure Prediction on the Face Centered Cubic Lattice by Local Search , 2008, AAAI.

[44]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[45]  Hans-Joachim Böckenhauer,et al.  A Local Move Set for Protein Folding in Triangular Lattice Models , 2008, WABI.

[46]  Kathleen Steinhöfel,et al.  Population-based local search for protein folding simulation in the MJ energy model and cubic lattices , 2009, Comput. Biol. Chem..

[47]  Amarda Shehu,et al.  An Evolutionary Search Algorithm to Guide Stochastic Search for Near-Native Protein Conformations with Multiobjective Analysis , 2013, AAAI 2013.

[48]  H. Morowitz,et al.  Energy Flow in Biology , 1969 .

[49]  M. Lewis,et al.  Calculation of the free energy of association for protein complexes , 1992, Protein science : a publication of the Protein Society.

[50]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[51]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[52]  Yue,et al.  Sequence-structure relationships in proteins and copolymers. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[53]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[54]  Abdul Sattar,et al.  Mixed Heuristic Local Search for Protein Structure Prediction , 2013, AAAI.

[55]  J. Zhang,et al.  Protein-length distributions for the three domains of life. , 2000, Trends in genetics : TIG.

[56]  Oliver D. King,et al.  The tip of the iceberg: RNA-binding proteins with prion-like domains in neurodegenerative disease , 2012, Brain Research.

[57]  C. Dobson Protein folding and misfolding , 2003, Nature.

[58]  Manuel C. Peitsch,et al.  SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[59]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[60]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[61]  Michael N. Vrahatis,et al.  Particle Swarm Optimization and Intelligence: Advances and Applications , 2010 .

[62]  Rolf Backofen,et al.  CPSP-tools – Exact and complete algorithms for high-throughput 3D lattice protein studies , 2008, BMC Bioinformatics.

[63]  Federico Fogolari,et al.  Amino acid empirical contact energy definitions for fold recognition in the space of contact maps , 2003, BMC Bioinformatics.

[64]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[65]  José Neves,et al.  Preventing Premature Convergence to Local Optima in Genetic Algorithms via Random Offspring Generation , 1999, IEA/AIE.

[66]  Abdul Sattar,et al.  The road not taken: retreat and diverge in local search for simplified protein structure prediction , 2013, BMC Bioinformatics.

[67]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[68]  Pascal Van Hentenryck,et al.  On Lattice Protein Structure Prediction Revisited , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[69]  Reinhard Sterner,et al.  Protein design at the crossroads of biotechnology, chemistry, theory, and evolution. , 2003, Angewandte Chemie.

[70]  C. Levinthal Are there pathways for protein folding , 1968 .

[71]  Jakub Marecek,et al.  Handbook of Approximation Algorithms and Metaheuristics , 2010, Comput. J..

[72]  Andrew Lewis,et al.  Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low-Resolution Model , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[73]  Christian Blum,et al.  Ant colony optimization: Introduction and recent trends , 2005 .

[74]  David A. Patterson,et al.  Computer organization and design - the hardware / software interface (3. ed.) , 2007 .

[75]  Martin Raff,et al.  The Shape and Structure of Proteins , 2002 .