A Symmetry-Free Subspace for Ab initioProtein Folding Simulations

Ab initio protein structure prediction usually tries to find a ground state in an extremely large phase space. Stochastic search algorithms are often employed by using a predefined energy function. However, for each valid conformation in the search phase space, there are usually several counterparts that are reflective, rotated or reflectively rotated forms of the current conformation, imprecisely called isometric conformations here. In protein folding, these isometric conformations correspond to the different rotation states caused by admissible protein structure transitions. In structure prediction, these isometric conformations, owning the same energy value, not only significantly increase the search complexity but also degrade the stability of some local search algorithms. In this paper, we will prove that there exists a subspace that is unique (no two conformations in the space are isometric) and complete (for any valid conformation, there exists a corresponding conformation in the subspace that is a reflective or rotated form of it). We demonstrate that this subspace, which is about 1/24 of the conventional search space in the 3D lattice model and 1/8 in the 2D model contains the lowest energy conformation, and all other isometric lowest energy forms can then be obtained by protein rotation. Our experiments show that the subspace can significantly speed up existing local search algorithms.

[1]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[2]  A. Sokal,et al.  The pivot algorithm: A highly efficient Monte Carlo method for the self-avoiding walk , 1988 .

[3]  M. Karplus,et al.  Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. , 1994, Journal of molecular biology.

[4]  P. Gennes Reptation of a Polymer Chain in the Presence of Fixed Obstacles , 1971 .

[5]  K. Dill,et al.  Folding kinetics of proteins and copolymers , 1992 .

[6]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[7]  Hans-Joachim Böckenhauer,et al.  Protein folding in the HP model on grid lattices with diagonals , 2007, Discret. Appl. Math..

[8]  Alessandro Pandini,et al.  MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database , 2007, Bioinform..

[9]  S. Toma,et al.  Contact interactions method: A new algorithm for protein folding simulations , 1996, Protein science : a publication of the Protein Society.

[10]  R. Unger,et al.  Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications. , 1993, Bulletin of Mathematical Biology.

[11]  A Maritan,et al.  Recurrent oligomers in proteins: An optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies , 2000, Proteins.

[12]  K. Dill,et al.  Transition states and folding dynamics of proteins and heteropolymers , 1994 .

[13]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[14]  H. Scheraga,et al.  Monte Carlo-minimization approach to the multiple-minima problem in protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[15]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[16]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[17]  L. Mirny,et al.  Protein folding theory: from lattice to all-atom models. , 2001, Annual review of biophysics and biomolecular structure.

[18]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[19]  N. Madras,et al.  THE SELF-AVOIDING WALK , 2006 .

[20]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[21]  P. Gennes,et al.  Reptation of a Polymer Chain in the Presence of Fixed Obstacles , 1971 .

[22]  The designability of protein structures. , 2001 .

[23]  K. Dill Polymer principles and protein folding , 1999, Protein science : a publication of the Protein Society.

[24]  Pawel Winter,et al.  Reconstructing protein structure from solvent exposure using tabu search , 2006, Algorithms for Molecular Biology.

[25]  P. J. Steinbach,et al.  Exploring peptide energy landscapes: A test of force fields and implicit solvent models , 2004, Proteins.

[26]  Eugene I Shakhnovich,et al.  A knowledge‐based move set for protein folding , 2007, Proteins.

[27]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[28]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.