Feature space resampling for protein conformational search

De novo protein structure prediction requires location of the lowest energy state of the polypeptide chain among a vast set of possible conformations. Powerful approaches include conformational space annealing, in which search progressively focuses on the most promising regions of conformational space, and genetic algorithms, in which features of the best conformations thus far identified are recombined. We describe a new approach that combines the strengths of these two approaches. Protein conformations are projected onto a discrete feature space which includes backbone torsion angles, secondary structure, and beta pairings. For each of these there is one “native” value: the one found in the native structure. We begin with a large number of conformations generated in independent Monte Carlo structure prediction trajectories from Rosetta. Native values for each feature are predicted from the frequencies of feature value occurrences and the energy distribution in conformations containing them. A second round of structure prediction trajectories are then guided by the predicted native feature distributions. We show that native features can be predicted at much higher than background rates, and that using the predicted feature distributions improves structure prediction in a benchmark of 28 proteins. The advantages of our approach are that features from many different input structures can be combined simultaneously without producing atomic clashes or otherwise physically inviable models, and that the features being recombined have a relatively high chance of being correct. Proteins 2010. © 2009 Wiley‐Liss, Inc.

[1]  G. Box,et al.  On the Experimental Attainment of Optimum Conditions , 1951 .

[2]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[3]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[4]  L. Fox,et al.  JOURNAL OF THE INSTITUTE OF MATHEMATICS AND ITS APPLICATIONS , 1977 .

[5]  P. Argos,et al.  Potential of genetic algorithms in protein folding and protein engineering simulations. , 1992, Protein engineering.

[6]  J. Moult,et al.  Ab initio structure prediction for small polypeptides and protein fragments using genetic algorithms , 1995, Proteins.

[7]  A. Kidera,et al.  Multicanonical Ensemble Generated by Molecular Dynamics Simulation for Enhanced Conformational Sampling of Peptides , 1997 .

[8]  Jooyoung Lee,et al.  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997, J. Comput. Chem..

[9]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[10]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[11]  Y. Cui,et al.  Protein folding simulation with genetic algorithm and supersecondary structure constraints , 1998, Proteins.

[12]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[13]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[14]  D. Landau,et al.  Efficient, multiple-range random walk algorithm to calculate the density of states. , 2000, Physical review letters.

[15]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[16]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  David Baker,et al.  Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Oliver Brock,et al.  Improving protein structure prediction with model-based search , 2005, ISMB.

[19]  David Baker,et al.  Improved beta‐protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation , 2006, Proteins.

[20]  David Kim,et al.  Feature Selection Methods for Improving Protein Structure Prediction with Rosetta , 2007, NIPS.

[21]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[22]  David Baker,et al.  Macromolecular modeling with rosetta. , 2008, Annual review of biochemistry.

[23]  A. Laio,et al.  Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science , 2008 .

[24]  David E. Kim,et al.  Sampling bottlenecks in de novo protein structure prediction. , 2009, Journal of molecular biology.