Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding.

Specification of the three dimensional structure of a protein from its amino acid sequence, also called a "Grand Challenge" problem, has eluded a solution for over six decades. A modestly successful strategy has evolved over the last couple of decades based on development of scoring functions (e.g. mimicking free energy) that can capture native or native-like structures from an ensemble of decoys generated as plausible candidates for the native structure. A scoring function must be fast enough in discriminating the native from unfolded/misfolded structures, and requires validation on a large data set(s) to generate sufficient confidence in the score. Here we develop a scoring function called pcSM that detects true native structure in the top 5 with 93% accuracy from an ensemble of candidate structures. If we eliminate the native from ensemble of decoys then pcSM is able to capture near native structure (RMSD<=5Ǻ) in top 10 with 86% accuracy. The parameters considered in pcSM are a C-alpha Euclidean metric, secondary structural propensity, surface areas and an intramolecular energy function. pcSM has been tested on 415 systems consisting 142,698 decoys (public and CASP-largest reported hitherto in literature). The average rank for the native is 2.38, a significant improvement over that existing in literature. In-silico protein structure prediction requires robust scoring technique(s). Therefore, pcSM is easily amenable to integration into a successful protein structure prediction strategy. The tool is freely available at http://www.scfbio-iitd.res.in/software/pcsm.jsp.

[1]  Kyle A. Beauchamp,et al.  Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). , 2010, Journal of the American Chemical Society.

[2]  U Bastolla,et al.  How to guarantee optimal stability for most representative structures in the protein data bank , 2001, Proteins.

[3]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[4]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[5]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[6]  E. Domany,et al.  Pairwise contact potentials are unsuitable for protein folding , 1998 .

[7]  Jie Liang,et al.  Empirical potential function for simplified protein models: Combining contact and local sequence–structure descriptors , 2006, Proteins.

[8]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[9]  B Jayaram,et al.  A Stoichiometry Driven Universal Spatial Organization of Backbones of Folded Proteins: Are there Chargaff's Rules for Protein Folding? , 2010, Journal of biomolecular structure & dynamics.

[10]  K. Dill,et al.  The Protein Folding Problem , 1993 .

[11]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[12]  Taesung Park,et al.  Response projected clustering for direct association with physiological and clinical response data , 2008, BMC Bioinformatics.

[13]  P. K. Mehta,et al.  A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70% , 1995, Protein science : a publication of the Protein Society.

[14]  Debashish Sahu,et al.  Bhageerath: an energy based web enabled computer software suite for limiting the search space of tertiary structures of small globular proteins , 2006, Nucleic acids research.

[15]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[16]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[17]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[18]  D. Case,et al.  Dynamics of a type VI reverse turn in a linear peptide in aqueous solution. , 1997, Folding & design.

[19]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[20]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[21]  C Geourjon,et al.  SOPM: a self-optimized method for protein secondary structure prediction. , 1994, Protein engineering.

[22]  P M Cullis,et al.  Affinities of amino acid side chains for solvent water. , 1981, Biochemistry.

[23]  M Levitt,et al.  A model of the molten globule state from molecular dynamics simulations. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[24]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[25]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[26]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[27]  A. Godzik,et al.  Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets , 1995, Protein science : a publication of the Protein Society.

[28]  Andrzej Kloczkowski,et al.  Four‐body contact potentials derived from two protein datasets to discriminate native structures from decoys , 2007, Proteins.

[29]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[30]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[31]  Christian Cole,et al.  The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..

[32]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[33]  A Kolinski,et al.  Correlation between knowledge‐based and detailed atomic potentials: Application to the unfolding of the GCN4 leucine zipper , 1999, Proteins.

[34]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[35]  W. L. Jorgensen,et al.  Molecular dynamics simulations of the unfolding of apomyoglobin in water. , 1993, Biochemistry.

[36]  D J Osguthorpe,et al.  Refined models for computer simulation of protein folding. Applications to the study of conserved secondary structure and flexible hinge points during the folding of pancreatic trypsin inhibitor. , 1979, Journal of molecular biology.

[37]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[38]  Christophe G. Lambert,et al.  ESyPred3D: Prediction of proteins 3D structures , 2002, Bioinform..

[39]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[40]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[41]  András Fiser,et al.  Effects of amino acid composition, finite size of proteins, and sparse statistics on distance‐dependent statistical pair potentials , 2007, Proteins.

[42]  R. Unger,et al.  Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications. , 1993, Bulletin of Mathematical Biology.

[43]  Pascal Benkert,et al.  QMEAN server for protein model quality estimation , 2009, Nucleic Acids Res..

[44]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[45]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[46]  V. Pande,et al.  The Trp cage: folding kinetics and unfolded state topology via molecular dynamics simulations. , 2002, Journal of the American Chemical Society.

[47]  Nidhi Arora,et al.  Energetics of Base Pairs in B-DNA in Solution: An Appraisal of Potential Functions and Dielectric Treatments , 1998 .

[48]  Jeffrey Skolnick,et al.  M-TASSER: an algorithm for protein quaternary structure prediction. , 2008, Biophysical journal.

[49]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[50]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[51]  Yang Cao,et al.  NCACO-score: An effective main-chain dependent scoring function for structure modeling , 2011, BMC Bioinformatics.

[52]  A. Roitberg,et al.  All-atom structure prediction and folding simulations of a stable protein. , 2002, Journal of the American Chemical Society.

[53]  A. Fersht,et al.  Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[54]  A. Kolinski,et al.  Derivation of protein‐specific pair potentials based on weak sequence fragment similarity , 2000, Proteins.

[55]  Qiaojun Fang,et al.  Protein refolding in silico with atom-based statistical potentials and conformational search using a simple genetic algorithm. , 2006, Journal of molecular biology.

[56]  Nidhi Arora,et al.  Strength of hydrogen bonds in helices , 1997, J. Comput. Chem..

[57]  P. Wolynes,et al.  The energy landscapes and motions of proteins. , 1991, Science.

[58]  Solvation thermodynamics of amino acids Assessment of the electrostatic contribution and force-field dependence , 1997 .

[59]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[60]  William L. Jorgensen,et al.  Free energy calculations: a breakthrough for modeling organic chemistry in solution , 1989 .

[61]  N. Linial,et al.  On the design and analysis of protein folding potentials , 2000, Proteins.

[62]  B Jayaram,et al.  Backbones of folded proteins reveal novel invariant amino acid neighborhoods. , 2011, Journal of biomolecular structure & dynamics.

[63]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[64]  M. Levitt,et al.  Molecular dynamics of native protein. I. Computer simulation of trajectories. , 1983, Journal of molecular biology.

[65]  Andrej ⩽ali,et al.  Comparative protein modeling by satisfaction of spatial restraints , 1995 .

[66]  Bhyravabhotla Jayaram,et al.  Free Energy Analysis of the Conformational Preferences of A and B Forms of DNA in Solution , 1998 .

[67]  H. Berendsen,et al.  COMPUTER-SIMULATION OF MOLECULAR-DYNAMICS - METHODOLOGY, APPLICATIONS, AND PERSPECTIVES IN CHEMISTRY , 1990 .

[68]  M Levitt,et al.  Molecular dynamics of native protein. II. Analysis and nature of motion. , 1983, Journal of molecular biology.

[69]  Eric J. Sorin,et al.  Beta-hairpin folding simulations in atomistic detail using an implicit solvent model. , 2001, Journal of molecular biology.

[70]  A. Godzik,et al.  Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? , 1997, Protein science : a publication of the Protein Society.

[71]  B. Jayaram,et al.  Free energy component analysis for drug design: a case study of HIV-1 protease-inhibitor binding. , 2001, Journal of medicinal chemistry.

[72]  Peter L. Freddolino,et al.  Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. , 2008, Biophysical journal.

[73]  David B. Searls,et al.  Grand challenges in computational biology , 1998 .

[74]  C. Brooks,et al.  First-principles calculation of the folding free energy of a three-helix bundle protein. , 1995, Science.

[75]  B Jayaram,et al.  Protein Structure Evaluation using an All-Atom Energy Based Empirical Scoring Function , 2006, Journal of biomolecular structure & dynamics.

[76]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[77]  B Jayaram,et al.  A computational pathway for bracketing native-like structures fo small alpha helical globular proteins. , 2005, Physical chemistry chemical physics : PCCP.

[78]  Eric J. Sorin,et al.  β-hairpin folding simulations in atomistic detail using an implicit solvent model1 , 2001 .

[79]  András Fiser,et al.  New statistical potential for quality assessment of protein models and a survey of energy functions , 2010, BMC Bioinformatics.

[80]  Bhyravabhotla Jayaram,et al.  Solvation Free Energy of Biomacromolecules: Parameters for a Modified Generalized Born Model Consistent with the AMBER Force Field , 1998 .

[81]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[82]  Bart De Moor,et al.  Gene prioritization and clustering by multi-view text mining , 2010, BMC Bioinformatics.

[83]  Wei Zhang,et al.  SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model , 2008, PloS one.

[84]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[85]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[86]  Adam Liwo,et al.  Development of Physics-Based Energy Functions that Predict Medium-Resolution Structures for Proteins of the α, β, and α/β Structural Classes , 2001 .

[87]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[88]  Bhyravabhotla Jayaram,et al.  Local dielectric environment of B-DNA in solution : Results from a 14 ns molecular dynamics trajectory , 1998 .

[89]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[90]  W. Kabsch,et al.  How good are predictions of protein secondary structure? , 1983, FEBS letters.

[91]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[92]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[93]  B. McConkey,et al.  Discrimination of native protein structures using atom–atom contact scoring , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[94]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[95]  Rahul Raman,et al.  Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds , 2010, PloS one.

[96]  Ceslovas Venclovas,et al.  Assessment of progress over the CASP experiments , 2003, Proteins.

[97]  M. Karplus,et al.  Dynamics of folded proteins , 1977, Nature.

[98]  L. Pauling,et al.  The pleated sheet, a new layer configuration of polypeptide chains. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[99]  M. Prueitt Computer Simulation of Molecular Dynamics. , 1971 .

[100]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[101]  D. Beveridge,et al.  Free energy via molecular simulation: applications to chemical and biomolecular systems. , 1989, Annual review of biophysics and biophysical chemistry.

[102]  Jeffrey Skolnick,et al.  Performance of the Pro‐sp3‐TASSER server in CASP8 , 2009, Proteins.

[103]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[104]  E. Shakhnovich,et al.  Analysis of knowledge‐based protein‐ligand potentials using a self‐consistent method , 2008, Protein science : a publication of the Protein Society.

[105]  Shing-Chung Ngan,et al.  PROTINFO: new algorithms for enhanced protein structure predictions , 2005, Nucleic Acids Res..

[106]  A. Li,et al.  Investigation of the solution structure of chymotrypsin inhibitor 2 using molecular dynamics: comparison to x-ray crystallographic and NMR data. , 1995, Protein engineering.

[107]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[108]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[109]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[110]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[111]  R L Jernigan,et al.  Short‐range conformational energies, secondary structure propensities, and recognition of correct sequence‐structure matches , 1997, Proteins.

[112]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[113]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[114]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[115]  R. Jernigan,et al.  Structure-derived potentials and protein simulations. , 1996, Current opinion in structural biology.

[116]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[117]  P. Flory Principles of polymer chemistry , 1953 .

[118]  D A Parry,et al.  Quantitative comparison of the ability of hydropathy scales to recognize surface β‐strands in proteins , 2001, Proteins.

[119]  Paul A. Bates,et al.  Domain Fishing: a first step in protein comparative modelling , 2002, Bioinform..

[120]  David T. Jones,et al.  pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination , 2009, Bioinform..

[121]  Peter L. Freddolino,et al.  Force field bias in protein folding simulations. , 2009, Biophysical journal.

[122]  Aviezri S. Fraenkel,et al.  Complexity of protein folding , 1993 .

[123]  A. Giuliani,et al.  Bmc Structural Biology a Knowledge-based Structure-discriminating Function That Requires Only Main-chain Atom Coordinates , 2008 .

[124]  Qiaojun Fang,et al.  A consistent set of statistical potentials for quantifying local side‐chain and backbone interactions , 2005, Proteins.

[125]  Pinak Chakrabarti,et al.  Discriminating the native structure from decoys using scoring functions based on the residue packing in globular proteins , 2009, BMC Structural Biology.

[126]  X. Daura,et al.  Reversible peptide folding in solution by molecular dynamics simulation. , 1998, Journal of molecular biology.

[127]  W E Stites,et al.  Packing is a key selection factor in the evolution of protein hydrophobic cores. , 2001, Biochemistry.

[128]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[129]  Eugene I Shakhnovich,et al.  Native atom types for knowledge-based potentials: application to binding energy prediction. , 2004, Journal of medicinal chemistry.

[130]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[131]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[132]  David S. Goodsell,et al.  The RCSB Protein Data Bank: redesigned web site and web services , 2010, Nucleic Acids Res..

[133]  B Jayaram,et al.  ProRegIn: A regularity index for the selection of native-like tertiary structures of proteins , 2007, Journal of Biosciences.

[134]  Jianpeng Ma,et al.  OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. , 2008, Journal of molecular biology.

[135]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[136]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[137]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[138]  Peter A. Kollman,et al.  FREE ENERGY CALCULATIONS : APPLICATIONS TO CHEMICAL AND BIOCHEMICAL PHENOMENA , 1993 .

[139]  L. Pauling,et al.  Atomic coordinates and structure factors for two helical configurations of polypeptide chains. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[140]  Christophe Combet,et al.  Geno3D: automatic comparative molecular modelling of protein , 2002, Bioinform..

[141]  S Banu Ozkan,et al.  The protein folding problem: when will it be solved? , 2007, Current opinion in structural biology.

[142]  András Fiser,et al.  Structural Characteristics of Novel Protein Folds , 2010, PLoS Comput. Biol..

[143]  Nidhi Arora,et al.  Strength of hydrogen bonds in α helices , 1997 .