Another look at the conditions for the extraction of protein knowledge‐based potentials

Protein knowledge‐based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse‐grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge‐based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi‐chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi‐chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi‐chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions. Proteins 2009. © 2008 Wiley‐Liss, Inc.

[1]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[2]  A. Kolinski,et al.  Derivation of protein‐specific pair potentials based on weak sequence fragment similarity , 2000, Proteins.

[3]  J R Banavar,et al.  Interaction potentials for protein folding , 1998, Proteins.

[4]  M. Betancourt Smoothing the landscapes of protein folding: Insights from a minimal model , 1998 .

[5]  A. Godzik,et al.  Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? , 1997, Protein science : a publication of the Protein Society.

[6]  J Moult,et al.  Comparison of database potentials and molecular mechanics force fields. , 1997, Current opinion in structural biology.

[7]  D Gilis,et al.  Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power. , 1998, European journal of biochemistry.

[8]  S. Bryant,et al.  The frequency of ion‐pair substructures in proteins is quantitatively related to electrostatic potential: A statistical model for nonbonded interactions , 1991, Proteins.

[9]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[10]  Cecilia Clementi,et al.  Determination of interaction potentials of amino acids from native protein structures: Tests on simple lattice models , 1999 .

[11]  P. De Los Rios,et al.  Effective interactions cannot replace solvent effects in a lattice model of proteins. , 2003, Physical review letters.

[12]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[13]  N. Linial,et al.  On the design and analysis of protein folding potentials , 2000, Proteins.

[14]  J. Straub,et al.  Orientational potentials extracted from protein structures improve native fold recognition , 2004, Protein science : a publication of the Protein Society.

[15]  M. Levitt,et al.  Improved protein structure selection using decoy-dependent discriminatory functions , 2004, BMC Structural Biology.

[16]  Jian Qiu,et al.  Atomically detailed potentials to recognize native and approximate protein structures , 2005, Proteins.

[17]  Robert L Jernigan,et al.  How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins? , 2005, The Journal of chemical physics.

[18]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[19]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[20]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[21]  A. Ben-Naim STATISTICAL POTENTIALS EXTRACTED FROM PROTEIN STRUCTURES : ARE THESE MEANINGFUL POTENTIALS? , 1997 .

[22]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[23]  F. Melo,et al.  Novel knowledge-based mean force potential at atomic level. , 1997, Journal of molecular biology.

[24]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[25]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[26]  Federico Fogolari,et al.  Amino acid empirical contact energy definitions for fold recognition in the space of contact maps , 2003, BMC Bioinformatics.

[27]  A. Maritan,et al.  Maximum entropy approach for deducing amino Acid interactions in proteins. , 2008, Physical review letters.

[28]  Seung Yup Lee,et al.  Analysis of TASSER‐based CASP7 protein structure prediction results , 2007, Proteins.

[29]  Christopher M. Summa,et al.  An atomic environment potential for use in protein structure prediction. , 2005, Journal of molecular biology.

[30]  T Schlick,et al.  Lattice protein folding with two and four‐body statistical potentials , 2001, Proteins.

[31]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[32]  An optimal derivation of a potential for protein folding , 1999 .

[33]  S H Kim,et al.  Environment-dependent residue contact energies for proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[34]  M. Levitt,et al.  A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. , 2003, Journal of molecular biology.

[35]  A. Tropsha,et al.  Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[36]  Hongyi Zhou,et al.  A physical reference state unifies the structure‐derived potential of mean force for protein folding and binding , 2004, Proteins.

[37]  Marcos R Betancourt A reduced protein model with accurate native‐structure identification ability , 2003, Proteins.

[38]  Andrzej Kloczkowski,et al.  Four‐body contact potentials derived from two protein datasets to discriminate native structures from decoys , 2007, Proteins.

[39]  R A Goldstein,et al.  How to generate improved potentials for protein tertiary structure prediction: A lattice model study , 2000, Proteins.

[40]  K Nishikawa,et al.  Knowledge-based potential defined for a rotamer library to design protein sequences. , 2001, Protein engineering.

[41]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[42]  Eugene I Shakhnovich,et al.  Lessons from the design of a novel atomic potential for protein folding , 2005, Protein science : a publication of the Protein Society.

[43]  A. Tropsha,et al.  HIV‐1 protease function and structure studies with the simplicial neighborhood analysis of protein packing method , 2008, Proteins.

[44]  A Rojnuckarin,et al.  Knowledge‐based interaction potentials for proteins , 1999, Proteins.

[45]  E I Shakhnovich,et al.  Specific nucleus as the transition state for protein folding: evidence from the lattice model. , 1994, Biochemistry.

[46]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[47]  M Vendruscolo,et al.  Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? , 2000, Proteins.

[48]  Jianpeng Ma,et al.  OPUS‐Ca: A knowledge‐based potential function requiring only Cα positions , 2007, Protein science : a publication of the Protein Society.

[49]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[50]  S. Wodak,et al.  Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. , 1991, Journal of molecular biology.

[51]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[52]  L Serrano,et al.  Analysis of the effect of local interactions on protein stability. , 1996, Folding & design.

[53]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[54]  U Bastolla,et al.  A statistical mechanical method to optimize energy functions for protein folding. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[56]  D Thirumalai,et al.  Development of novel statistical potentials for protein fold recognition. , 2004, Current opinion in structural biology.

[57]  E I Shakhnovich,et al.  Evolution-like selection of fast-folding model proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Y. Matsuo,et al.  Development of pseudoenergy potentials for assessing protein 3-D-1-D compatibility and detecting weak homologies. , 1993, Protein engineering.

[59]  Marcos R Betancourt Knowledge-based potential for the polypeptide backbone. , 2008, The journal of physical chemistry. B.

[60]  Bala Krishnamoorthy,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm481 Structural bioinformatics Four-Body Scoring Function for Mutagenesis , 2007 .

[61]  Marcos R. Betancourt,et al.  Protein Sequence Design by Energy Landscaping , 2002 .

[62]  R. Jernigan,et al.  Structure-derived potentials and protein simulations. , 1996, Current opinion in structural biology.

[63]  P. Koehl,et al.  Influence of protein structure databases on the predictive power of statistical pair potentials , 1998, Proteins.

[64]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[65]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[66]  R. Broglia,et al.  Deriving amino acid contact potentials from their frequencies of occurrence in proteins: a lattice model study , 2004 .

[67]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[68]  E. Domany,et al.  Pairwise contact potentials are unsuitable for protein folding , 1998 .

[69]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[70]  Qiaojun Fang,et al.  A consistent set of statistical potentials for quantifying local side‐chain and backbone interactions , 2005, Proteins.

[71]  D Thirumalai,et al.  Continuous anisotropic representation of coarse-grained potentials for proteins by spherical harmonics synthesis. , 2004, Journal of molecular graphics & modelling.

[72]  A. Finkelstein,et al.  Why do protein architectures have boltzmann‐like statistics? , 1995, Proteins.

[73]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[74]  Flavio Seno,et al.  Variational Approach to Protein Design and Extraction of Interaction Potentials , 1998, cond-mat/9804054.