Resampling methods for protein structure prediction

Ab initio protein structure prediction entails predicting the three-dimensional conformation of a protein from its amino acid sequence without the use of an experimentally determined template structure. In this thesis, I present a new approach to ab initio protein structure prediction that divides the search problem into two parts: sampling in a space of discrete-valued structural features, and continuous search over conformations while constraining the desired features. Both parts are carried out using Rosetta, a leading structure prediction algorithm. Rosetta is a Monte Carlo energy minimization method requiring many random restarts to find structures near the correct, or native structure. Our methods, which we call resampling methods, make use of an initial round of Rosetta-generated local minima to learn properties of the energy landscape that guide a subsequent “resampling” round of Rosetta search toward better predictions. One of the main innovations of this thesis is to attempt to deduce from the initial set of Rosetta models not the entire native conformation but rather a few specific features of the native conformation. Features include backbone torsion angles, per-residue secondary structure, exposure of residues to solvent, and a three-tiered hierarchy of beta pairing features. For each feature there is one “native” value: the one found in the native structure. Native feature values are generally enriched in structures with low energy, as the native structure of a protein is significantly lower in energy than non-native structures and the energy of a protein is to some extent the sum of spatially local contributions. We have developed two methods for feature-space resampling based on this observation. The first method employs feature selection methods to identify structural feature values that give rise to low energy, which are then enriched in the resampling round. The second, more sophisticated method updates the sampling distribution for all features at once, not just a selected few, by predicting the likelihood that each feature value is native. Our results indicate that both methods, especially the second one, yield structure predictions significantly better than those produced by Rosetta alone.

[1]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[2]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  P. Bradley,et al.  High-resolution structure prediction and the crystallographic phase problem , 2007, Nature.

[5]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[6]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[7]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[9]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[10]  Jooyoung Lee,et al.  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997, J. Comput. Chem..

[11]  M. Karplus,et al.  Discrimination of the native from misfolded protein models with an energy function including implicit solvation. , 1999, Journal of molecular biology.

[12]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[13]  D. Baker,et al.  Molecular dynamics in the endgame of protein structure prediction. , 2001, Journal of molecular biology.

[14]  William R Taylor,et al.  Modelling zinc-binding proteins with GADGET: genetic algorithm and distance geometry for exploring topology. , 2003, Journal of molecular biology.

[15]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[16]  David Baker,et al.  Improved beta‐protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation , 2006, Proteins.

[17]  Andrej Sali,et al.  Ligand Specificity of Brain Lipid-binding Protein* , 1996, The Journal of Biological Chemistry.

[18]  Roland L. Dunbrack,et al.  Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains , 1994, Nature Structural Biology.

[19]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  M. Nadeau Proteins : Structure , Function , and Genetics , .

[22]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[23]  I W Hunter,et al.  3D-1D threading methods for protein fold recognition. , 2000, Pharmacogenomics.

[24]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[25]  Y. Cui,et al.  Protein folding simulation with genetic algorithm and supersecondary structure constraints , 1998, Proteins.

[26]  Oliver Brock,et al.  Improving protein structure prediction with model-based search , 2005, ISMB.

[27]  J. Moult,et al.  Ab initio structure prediction for small polypeptides and protein fragments using genetic algorithms , 1995, Proteins.

[28]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[29]  J. Skolnick,et al.  Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm , 2004, Proteins.

[30]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[31]  H A Scheraga,et al.  Improved genetic algorithm for the protein folding problem by use of a Cartesian combination operator , 1996, Protein science : a publication of the Protein Society.

[32]  W A Koppensteiner,et al.  Knowledge-based potentials--back to the roots. , 1998, Biochemistry. Biokhimiia.

[33]  David Baker,et al.  Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[37]  Leszek Rychlewski,et al.  Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[38]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[39]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[41]  D. Phillips,et al.  A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen's egg-white lysozyme. , 1969, Journal of molecular biology.

[42]  J. Kendrew,et al.  A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis , 1958, Nature.

[43]  G. Box,et al.  On the Experimental Attainment of Optimum Conditions , 1951 .

[44]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[45]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[46]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[47]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[48]  Jan Hermans,et al.  Discrimination between native and intentionally misfolded conformations of proteins: ES/IS, a new method for calculating conformational free energy that uses both dynamics simulations with an explicit solvent and an implicit solvent continuum model , 1998, Proteins.

[49]  Adam Godzik,et al.  Fold recognition methods. , 2005, Methods of biochemical analysis.

[50]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[51]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[52]  David E. Kim,et al.  Physically realistic homology models built with ROSETTA can be more accurate than their templates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[54]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[55]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[56]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[57]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[58]  Arne Elofsson,et al.  Automatic consensus‐based fold recognition using Pcons, ProQ, and Pmodeller , 2003, Proteins.

[59]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[60]  A. Sobel,et al.  The Journal of Biological Chemistry. , 2009, Nutrition reviews.

[61]  D A Agard,et al.  Kinetics versus thermodynamics in protein folding. , 1994, Biochemistry.

[62]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[63]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[64]  M J Sippl,et al.  Assembly of polypeptide and protein backbone conformations from low energy ensembles of short fragments: Development of strategies and construction of models for myoglobin, lysozyme, and thymosin β4 , 1992, Protein science : a publication of the Protein Society.

[65]  Richard S. Judson,et al.  Conformational searching methods for small molecules. II. Genetic algorithm approach , 1993, J. Comput. Chem..

[66]  R A Friesner,et al.  Prediction of loop geometries using a generalized born model of solvation effects , 1999, Proteins.

[67]  B. Honig,et al.  Free energy determinants of tertiary structure and the evaluation of protein models , 2000, Protein science : a publication of the Protein Society.

[68]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[69]  K. Wüthrich Protein structure determination in solution by NMR spectroscopy. , 1990, The Journal of biological chemistry.

[70]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[71]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[72]  M. Levitt,et al.  Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations. , 1996, Journal of molecular biology.

[73]  P. Argos,et al.  Potential of genetic algorithms in protein folding and protein engineering simulations. , 1992, Protein engineering.