Minimalist representations and the importance of nearest neighbor effects in protein folding simulations.

In order to investigate the level of representation required to simulate folding and predict structure, we test the ability of a variety of reduced representations to identify native states in decoy libraries and to recover the native structure given the advanced knowledge of the very broad native Ramachandran basin assignments. Simplifications include the removal of the entire side-chain or the retention of only the Cbeta atoms. Scoring functions are derived from an all-atom statistical potential that distinguishes between atoms and different residue types. Structures are obtained by minimizing the scoring function with a computationally rapid simulated annealing algorithm. Results are compared for simulations in which backbone conformations are sampled from a Protein Data Bank-based backbone rotamer library generated by either ignoring or including a dependence on the identity and conformation of the neighboring residues. Only when the Cbeta atoms and nearest neighbor effects are included do the lowest energy structures generally fall within 4 A of the native backbone root-mean square deviation (RMSD), despite the initial configuration being highly expanded with an average RMSD > or = 10 A. The side-chains are reinserted into the Cbeta models with minimal steric clash. Therefore, the detailed, all-atom information lost in descending to a Cbeta-level representation is recaptured to a large measure using backbone dihedral angle sampling that includes nearest neighbor effects and an appropriate scoring function.

[1]  Ariel Fernández,et al.  Three-body correlations in protein folding: the origin of cooperativity , 2002 .

[2]  S. Takada,et al.  Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[4]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[5]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[6]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[7]  Richard Bonneau,et al.  An improved protein decoy set for testing energy functions for protein structure prediction , 2003, Proteins.

[8]  W. V. van Gunsteren,et al.  Molecular dynamics simulations of small peptides: can one derive conformational preferences from ROESY spectra? , 2003, Chemistry.

[9]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[10]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[11]  Abhishek K. Jha,et al.  Helix, sheet, and polyproline II frequencies and strong nearest neighbor effects in a restricted coil library. , 2005, Biochemistry.

[12]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[13]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[14]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[15]  T. Sosnick,et al.  The barriers in protein folding , 1994, Nature Structural Biology.

[16]  K. Misura,et al.  PROTEINS: Structure, Function, and Bioinformatics 59:15–29 (2005) Progress and Challenges in High-Resolution Refinement of Protein Structure Models , 2022 .

[17]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[18]  D. Shortle,et al.  Prediction of protein structure by emphasizing local side‐chain/backbone interactions in ensembles of turn fragments , 2003, Proteins.

[19]  G. Rose,et al.  Building native protein conformation from highly approximate backbone torsion angles. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[21]  J. Craggs Applied Mathematical Sciences , 1973 .

[22]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[23]  S Miyano,et al.  Open source clustering software. , 2004, Bioinformatics.

[24]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[25]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[26]  D. Shortle Composites of local structure propensities: evidence for local encoding of long-range structure. , 2002, Protein science : a publication of the Protein Society.

[27]  Andrés Colubri,et al.  Prediction of Protein Structure by Simulating Coarse-grained Folding Pathways: A Preliminary Report , 2004, Journal of biomolecular structure & dynamics.

[28]  Marc S. Cortese,et al.  Comparing and combining predictors of mostly disordered proteins. , 2005, Biochemistry.

[29]  E I Shakhnovich,et al.  A test of lattice protein folding algorithms. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Yang Zhang,et al.  Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment , 2004, Bioinform..

[31]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[32]  T. Sosnick,et al.  Fast and slow intermediate accumulation and the initial barrier mechanism in protein folding. , 2002, Journal of molecular biology.

[33]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of Biomolecular NMR.

[34]  Z. Luthey-Schulten,et al.  Ab initio protein structure prediction. , 2002, Current opinion in structural biology.

[35]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[36]  Hongyi Zhou,et al.  An accurate, residue‐level, pair potential of mean force for folding and binding based on the distance‐scaled, ideal‐gas reference state , 2004, Protein science : a publication of the Protein Society.

[37]  Abhishek K. Jha,et al.  Statistical coil model of the unfolded state: resolving the reconciliation problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[39]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[40]  A. Sali,et al.  A composite score for predicting errors in protein structure models , 2006, Protein science : a publication of the Protein Society.

[41]  J. Gordon,et al.  Refined apoprotein structure of rat intestinal fatty acid binding protein produced in Escherichia coli. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Sven Hovmöller,et al.  Prediction of Protein Structure , 2004, Numerical Computer Methods, Part D.

[43]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[44]  Shibasish Chowdhury,et al.  Ab initio folding simulation of the Trp-cage mini-protein approaches NMR resolution. , 2003, Journal of molecular biology.

[46]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[47]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[48]  R. Elber,et al.  Distance‐dependent, pair potential for protein folding: Results from linear optimization , 2000, Proteins.

[49]  Emile H. L. Aarts,et al.  Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[50]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[51]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.