Refining Genetic Algorithm twin removal for high-resolution protein structure prediction

To gain a better understanding of how proteins function a process known as protein structure prediction (PSP) is carried out. However, experimental PSP methods, such as X-ray crystallography and Nuclear Magnetic Resonance (NMR), can be time-consuming and inaccurate. This has given rise to numerous computational PSP approaches to try and elicit a protein's three-dimensional conformation. A popular PSP search strategy is Genetic Algorithms (GA). GAs allow for a generic search approach, which can provide a generic improvement to alleviate the need to redefine the search strategies for separate sequences. Though GA's working principles are remarkable, a serious problem that is inherent in the GA search process is the growth of twins or identical chromosomes. Therefore, enhanced twin removal strategies are crucial for any GA search solving hard-optimisation problems like PSP. In this paper we explain our high-resolution GA feature-based resampling PSP approach and propose a twin removal strategy to further enhance its prediction accuracy. This includes investigating the optimal chromosome correlation factor (CCF) for our approach and defining a pre-built structure library for twin removal. We have also compared our GA approach with the popular Monte Carlo (MC) method for PSP. Our results indicate that out of all the CCF values we tested a CCF value of 0.8 provided the best level of diversity within our GA population. It also generated, on average, more native-like structures than any of the other CCF values, and clearly demonstrated that twin removal is needed in PSP when using GAs to obtain more accurate results.

[1]  N Gautham,et al.  Protein structure prediction using mutually orthogonal Latin squares and a genetic algorithm. , 2006, Biochemical and biophysical research communications.

[2]  Michael I. Jordan,et al.  Resampling methods for protein structure prediction , 2008 .

[3]  Yang Zhang,et al.  Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. , 2004, Biophysical journal.

[4]  G M Crippen,et al.  Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins. , 1994, Journal of molecular biology.

[5]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[6]  Holger H. Hoos,et al.  An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem , 2005, BMC Bioinformatics.

[7]  S. Toma,et al.  Contact interactions method: A new algorithm for protein folding simulations , 1996, Protein science : a publication of the Protein Society.

[8]  J Moult,et al.  Protein folding simulations with genetic algorithms and a detailed molecular description. , 1997, Journal of molecular biology.

[9]  Erich Bornberg-Bauer,et al.  Chain growth algorithms for HP-type lattice proteins , 1997, RECOMB '97.

[10]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[11]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[12]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[13]  Vincenzo Cutello,et al.  An Immune Algorithm for Protein Structure Prediction on Lattice Models , 2007, IEEE Transactions on Evolutionary Computation.

[14]  KalyanmoyDebandSamirAgrawal KanpurGeneticAlgorithmsLaboratory,et al.  A Niched-Penalty Approach for Constraint Handling in Genetic Algorithms , 2002 .

[15]  Abdul Sattar,et al.  Genetic algorithm feature-based resampling for protein structure prediction , 2010, IEEE Congress on Evolutionary Computation.

[16]  D. Baker,et al.  Prospects for ab initio protein structural genomics. , 2001, Journal of molecular biology.

[17]  Andrew Lewis,et al.  Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low-Resolution Model , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Abdul Sattar,et al.  Extended HP Model for Protein Structure Prediction , 2009, J. Comput. Biol..

[19]  Larry J. Eshelman,et al.  Preventing Premature Convergence in Genetic Algorithms by Preventing Incest , 1991, ICGA.

[20]  Christine M. Anderson-Cook Practical Genetic Algorithms (2nd ed.): Randy L. Haupt and Sue Ellen Haupt , 2005 .

[21]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[22]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[23]  Songde Ma,et al.  Protein folding simulations of the hydrophobic–hydrophilic model by combining tabu search with genetic algorithms , 2003 .

[24]  K Murugesan,et al.  A multi-objective evolutionary algorithm for protein structure prediction with immune operators , 2009, Computer methods in biomechanics and biomedical engineering.

[25]  Shing-Chung Ngan,et al.  PROTINFO: new algorithms for enhanced protein structure predictions , 2005, Nucleic Acids Res..

[26]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[27]  S. Ronald Duplicate genotypes in a genetic algorithm , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).