Statistical mechanics‐based method to extract atomic distance‐dependent potentials from protein structures

In this study, we have developed a statistical mechanics‐based iterative method to extract statistical atomic interaction potentials from known, nonredundant protein structures. Our method circumvents the long‐standing reference state problem in deriving traditional knowledge‐based scoring functions, by using rapid iterations through a physical, global convergence function. The rapid convergence of this physics‐based method, unlike other parameter optimization methods, warrants the feasibility of deriving distance‐dependent, all‐atom statistical potentials to keep the scoring accuracy. The derived potentials, referred to as ITScore/Pro, have been validated using three diverse benchmarks: the high‐resolution decoy set, the AMBER benchmark decoy set, and the CASP8 decoy set. Significant improvement in performance has been achieved. Finally, comparisons between the potentials of our model and potentials of a knowledge‐based scoring function with a randomized reference state have revealed the reason for the better performance of our scoring function, which could provide useful insight into the development of other physical scoring functions. The potentials developed in this study are generally applicable for structural selection in protein structure prediction. © 2011 Wiley‐Liss, Inc.

[1]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[2]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[3]  B. Honig,et al.  Protein structure prediction: inroads to biology. , 2005, Molecular cell.

[4]  W. C. Still,et al.  Semianalytical treatment of solvation for molecular mechanics and dynamics , 1990 .

[5]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[6]  Michael Levitt,et al.  Generalized ensemble methods for de novo structure prediction , 2009, Proceedings of the National Academy of Sciences.

[7]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[8]  M. Karplus,et al.  Discrimination of the native from misfolded protein models with an energy function including implicit solvation. , 1999, Journal of molecular biology.

[9]  D. Case,et al.  Theory and applications of the generalized born solvation model in macromolecular simulations , 2000, Biopolymers.

[10]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[11]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[12]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[13]  Jie Liang,et al.  Geometric cooperativity and anticooperativity of three‐body interactions in native proteins , 2005, Proteins.

[14]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[15]  Xiaoqin Zou,et al.  Chapter 14 - Mean-Force Scoring Functions for Protein–Ligand Binding , 2010 .

[16]  Silvio C. E. Tosatto,et al.  A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators , 2005, BMC Bioinformatics.

[17]  Jian Qiu,et al.  Atomically detailed potentials to recognize native and approximate protein structures , 2005, Proteins.

[18]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[19]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function for protein–protein recognition , 2008, Proteins.

[20]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[21]  Christodoulos A. Floudas,et al.  A novel high resolution CαCα distance dependent force field based on a high quality decoy set , 2006 .

[22]  K. Wüthrich,et al.  Torsion angle dynamics for NMR structure calculation with the new program DYANA. , 1997, Journal of molecular biology.

[23]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[24]  Richard Bonneau,et al.  An improved protein decoy set for testing energy functions for protein structure prediction , 2003, Proteins.

[25]  R. L. Henderson A uniqueness theorem for fluid pair correlation functions , 1974 .

[26]  R. Elber,et al.  Distance‐dependent, pair potential for protein folding: Results from linear optimization , 2000, Proteins.

[27]  J L Klepeis,et al.  A new pairwise folding potential based on improved decoy generation and side‐chain packing , 2004, Proteins.

[28]  P. Wolynes,et al.  Self-consistently optimized energy functions for protein structure prediction by molecular dynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Andrej Sali,et al.  Comparative Protein Structure Modeling and its Applications to Drug Discovery , 2004 .

[30]  B. Honig,et al.  Refining homology models by combining replica‐exchange molecular dynamics and statistical potentials , 2008, Proteins.

[31]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[32]  SHENG-YOU HUANG,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: I. Derivation of interaction potentials , 2006, J. Comput. Chem..

[33]  J. Skolnick In quest of an empirical potential for protein structure prediction. , 2006, Current opinion in structural biology.

[34]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[35]  Alexander Tropsha,et al.  Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations , 2003, Bioinform..

[36]  David Baker,et al.  Ranking predicted protein structures with support vector regression , 2007, Proteins.

[37]  M. Hao,et al.  How optimization of potential functions affects protein folding. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Jeffrey Skolnick,et al.  Can a physics‐based, all‐atom potential find a protein's native structure among misfolded structures? I. Large scale AMBER benchmarking , 2007, J. Comput. Chem..

[39]  K. Sharp,et al.  Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models , 1994 .

[40]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[41]  T. Huber,et al.  Protein fold recognition without Boltzmann statistics or explicit physical basis , 1998, Protein science : a publication of the Protein Society.

[42]  W A Koppensteiner,et al.  Knowledge-based potentials--back to the roots. , 1998, Biochemistry. Biokhimiia.

[43]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: II. Validation of the scoring function , 2006, J. Comput. Chem..

[44]  Yaoqi Zhou,et al.  Specific interactions for ab initio folding of protein terminal regions with secondary structures , 2008, Proteins.

[45]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[46]  E. M.,et al.  Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[47]  Jianpeng Ma,et al.  OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. , 2008, Journal of molecular biology.

[48]  J. Onuchic,et al.  Toward an outline of the topography of a realistic protein-folding funnel. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[49]  U Bastolla,et al.  A statistical mechanical method to optimize energy functions for protein folding. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Liliana Wroblewska,et al.  Protein model refinement using an optimized physics-based all-atom force field , 2008, Proceedings of the National Academy of Sciences.

[51]  Iosif I. Vaisman,et al.  Delaunay Tessellation of Proteins: Four Body Nearest-Neighbor Propensities of Amino Acid Residues , 1996, J. Comput. Biol..

[52]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[53]  K. Dill,et al.  Transition states and folding dynamics of proteins and heteropolymers , 1994 .

[54]  K. Sharp,et al.  Entropy in protein folding and in protein-protein interactions. , 1997, Current opinion in structural biology.

[55]  M. Levitt,et al.  A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. , 2003, Journal of molecular biology.

[56]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[57]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[58]  Hongyi Zhou,et al.  What is a desirable statistical energy functions for proteins and how can it be obtained? , 2007, Cell Biochemistry and Biophysics.

[59]  M. Levitt,et al.  Exploring conformational space with a simple lattice model for protein structure. , 1994, Journal of molecular biology.

[60]  E. Lomba,et al.  Determination of the interaction potential from the pair distribution function: an inverse Monte Carlo technique. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[62]  A. Ortiz,et al.  Evaluation of docking functions for protein-ligand docking. , 2001, Journal of medicinal chemistry.

[63]  Jeffrey Skolnick,et al.  Protein structure prediction by pro-Sp3-TASSER. , 2009, Biophysical journal.

[64]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[65]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[66]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[67]  C A Floudas,et al.  Distance dependent centroid to centroid force fields using high resolution decoys , 2008, Proteins.

[68]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[69]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[70]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[71]  Joshua D. Knowles,et al.  Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction , 2009, Bioinform..

[72]  D Thirumalai,et al.  Development of novel statistical potentials for protein fold recognition. , 2004, Current opinion in structural biology.

[73]  A. Liwo,et al.  A method for optimizing potential-energy functions by a hierarchical design of the potential-energy landscape: Application to the UNRES force field , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[74]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[75]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[76]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[77]  Rama Ranganathan,et al.  Knowledge-based potentials in protein design. , 2006, Current opinion in structural biology.

[78]  Xiaoqin Zou,et al.  Inclusion of Solvation and Entropy in the Knowledge-Based Scoring Function for Protein-Ligand Interactions , 2010, J. Chem. Inf. Model..

[79]  Yaoqi Zhou,et al.  Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions , 2008, Protein science : a publication of the Protein Society.

[80]  Andrzej Kloczkowski,et al.  Four‐body contact potentials derived from two protein datasets to discriminate native structures from decoys , 2007, Proteins.

[81]  Jie Liang,et al.  Chapter 4: Knowledge-based energy functions for computational studies of proteins , 2006, q-bio/0601026.

[82]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[83]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[84]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.