A replica exchange Monte Carlo algorithm for protein folding in the HP model

BackgroundThe ab initio protein folding problem consists of predicting protein tertiary structure from a given amino acid sequence by minimizing an energy function; it is one of the most important and challenging problems in biochemistry, molecular biology and biophysics. The ab initio protein folding problem is computationally challenging and has been shown to be NPMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFneVtcqqGqbauaaa@3961@-hard even when conformations are restricted to a lattice. In this work, we implement and evaluate the replica exchange Monte Carlo (REMC) method, which has already been applied very successfully to more complex protein models and other optimization problems with complex energy landscapes, in combination with the highly effective pull move neighbourhood in two widely studied Hydrophobic Polar (HP) lattice models.ResultsWe demonstrate that REMC is highly effective for solving instances of the square (2D) and cubic (3D) HP protein folding problem. When using the pull move neighbourhood, REMC outperforms current state-of-the-art algorithms for most benchmark instances. Additionally, we show that this new algorithm provides a larger ensemble of ground-state structures than the existing state-of-the-art methods. Furthermore, it scales well with sequence length, and it finds significantly better conformations on long biological sequences and sequences with a provably unique ground-state structure, which is believed to be a characteristic of real proteins. We also present evidence that our REMC algorithm can fold sequences which exhibit significant interaction between termini in the hydrophobic core relatively easily.ConclusionWe demonstrate that REMC utilizing the pull move neighbourhood significantly outperforms current state-of-the-art methods for protein structure prediction in the HP model on 2D and 3D lattices. This is particularly noteworthy, since so far, the state-of-the-art methods for 2D and 3D HP protein folding – in particular, the pruned-enriched Rosenbluth method (PERM) and, to some extent, Ant Colony Optimisation (ACO) – were based on chain growth mechanisms. To the best of our knowledge, this is the first application of REMC to HP protein folding on the cubic lattice, and the first extension of the pull move neighbourhood to a 3D lattice.

[1]  S. Toma,et al.  Contact interactions method: A new algorithm for protein folding simulations , 1996, Protein science : a publication of the Protein Society.

[2]  Andrew J. Parkes,et al.  Tuning Local Search for Satisfiability Testing , 1996, AAAI/IAAI, Vol. 1.

[3]  K. Hukushima,et al.  Exchange Monte Carlo Method and Application to Spin Glass Simulations , 1995, cond-mat/9512035.

[4]  Ján Manuch,et al.  Structure-Approximating Inverse Protein Folding Problem in the 2D HP Model , 2005, J. Comput. Biol..

[5]  P. Grassberger,et al.  Testing a new Monte Carlo algorithm for protein folding , 1997, Proteins.

[6]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[7]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[8]  U. Hansmann Parallel tempering algorithm for conformational studies of biological molecules , 1997, physics/9710041.

[9]  P. Grassberger Pruned-enriched Rosenbluth method: Simulations of θ polymers of chain length up to 1 000 000 , 1997 .

[10]  T. Dandekar,et al.  Improving genetic algorithms for protein folding simulations by systematic crossover. , 1999, Bio Systems.

[11]  Yuji Sugita,et al.  Replica-exchange multicanonical algorithm and multicanonical replica-exchange method for simulating systems with rough energy landscape , 2000, cond-mat/0009119.

[12]  Y. Sugita,et al.  Replica-exchange molecular dynamics method for protein folding , 1999 .

[13]  K. Dill,et al.  Cooperativity in protein-folding kinetics. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Y. Sugita,et al.  Free-Energy Calculations in Protein Folding by Generalized-Ensemble Algorithms , 2001, cond-mat/0102296.

[15]  Anders Irbäck Dynamical-Parameter Algorithms for Protein Folding , 1998 .

[16]  Yukito Iba EXTENDED ENSEMBLE MONTE CARLO , 2001 .

[17]  J. Skolnick,et al.  Comparison of three Monte Carlo conformational search strategies for a proteinlike homopolymer model: Folding thermodynamics and identification of low-energy structures , 2000 .

[18]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[19]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[20]  J. Skolnick,et al.  A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics , 2001 .

[21]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[22]  Joe Marks,et al.  Human-guided tabu search , 2002, AAAI/IAAI.

[23]  Holger H. Hoos,et al.  An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem , 2005, BMC Bioinformatics.

[24]  L. Dagdug Book Review: Molecular Driving Forces: Statistical Thermodynamics in Chemistry and Biology. Ken A. Dill and Sarina Bromberg, Garland Science, New York, 2003 , 2003 .

[25]  Peter Grassberger,et al.  Phase diagram of random heteropolymers: Replica approach and application of a new Monte Carlo algorithm , 2000 .

[26]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[27]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[28]  Thomas Stützle,et al.  Automatic Algorithm Configuration Based on Local Search , 2007, AAAI.

[29]  Erik D. Demaine,et al.  Long proteins with unique optimal foldings in the H-P model , 2002, Comput. Geom..

[30]  George Chikenji,et al.  MULTI-SELF-OVERLAP ENSEMBLE FOR PROTEIN FOLDING : GROUND STATE SEARCH AND THERMODYNAMICS , 1999, cond-mat/9903003.

[31]  Y. Sugita,et al.  Multidimensional replica-exchange method for free-energy calculations , 2000, cond-mat/0009120.

[32]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[33]  K Yue,et al.  Forces of tertiary structural organization in globular proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[34]  William E. Hart,et al.  Robust Proofs of NP-Hardness for Protein Folding: General Lattices and Energy Potentials , 1997, J. Comput. Biol..

[35]  Erik Sandelin,et al.  Monte Carlo study of the phase structure of compact polymer chains , 1998, cond-mat/9812017.

[36]  A Mitsutake,et al.  Generalized-ensemble algorithms for molecular simulations of biopolymers. , 2000, Biopolymers.

[37]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[38]  Hajime Yoshino,et al.  Exchange Monte Carlo Dynamics in the SK Model , 1998 .

[39]  K. Dill,et al.  Molecular driving forces , 2002 .

[40]  Walter H. Stockmayer,et al.  Monte Carlo Calculations on the Dynamics of Polymers in Dilute Solution , 1962 .

[41]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[42]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[43]  K. Dill,et al.  A fast conformational search strategy for finding low energy structures of model proteins , 1996, Protein science : a publication of the Protein Society.

[44]  J. Skolnick,et al.  Reduced models of proteins and their applications , 2004 .

[45]  Ulrich H E Hansmann,et al.  Parallel tempering simulations of HP‐36 , 2003, Proteins.

[46]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[47]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[48]  M. Waterman,et al.  Proceedings of the seventh annual international conference on Research in computational molecular biology , 2003, RECOMB 2003.

[49]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[50]  Hsiao-Ping Hsu,et al.  Growth-based optimization algorithm for lattice heteropolymers. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Jeffrey Kovac,et al.  Effect of bead movement rules on the relaxation of cubic lattice models of polymer chains , 1983 .

[52]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[53]  Mihalis Yannakakis,et al.  On the complexity of protein folding (abstract) , 1998, RECOMB '98.

[54]  Kurt Kremer,et al.  Monte Carlo simulation of lattice models for macromolecules , 1988 .

[55]  J. Pekny,et al.  A dynamic Monte Carlo algorithm for exploration of dense conformational spaces in heteropolymers , 1997 .

[56]  John M. Deutch,et al.  Analysis of Monte Carlo results on the kinetics of lattice polymer chains with excluded volume , 1975 .