A global machine learning based scoring function for protein structure prediction

We present a knowledge‐based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue–residue distances, residue–solvent distances, pairwise knowledge‐based potentials and a four‐body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge‐based potentials. These features were obtained for a large set of native and decoy structures and a back‐propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge‐based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org/#Software and http://mamiris.com/. Proteins 2014; 82:752–759. © 2013 Wiley Periodicals, Inc.

[1]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[2]  Andrzej Kloczkowski,et al.  Potentials 'R'Us web-server for protein energy estimations with coarse-grained knowledge-based potentials , 2010, BMC Bioinformatics.

[3]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[4]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[5]  Adam Liwo,et al.  United-residue force field for off-lattice protein-structure simulations: III. Origin of backbone hydrogen-bonding cooperativity in united-residue potentials , 1998, J. Comput. Chem..

[6]  Andrzej Kloczkowski,et al.  Lattice model for segmental orientation in deformed polymeric networks. 1. Contribution of intermolecular correlations , 1990 .

[7]  Andrzej Kloczkowski,et al.  Packing regularities in biological structures relate to their dynamics. , 2007, Methods in molecular biology.

[8]  Gregory A Voth,et al.  Multiscale modeling of biomolecular systems: in serial and in parallel. , 2007, Current opinion in structural biology.

[9]  A. Liwo,et al.  A method for optimizing potential-energy functions by a hierarchical design of the potential-energy landscape: Application to the UNRES force field , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Gregory A Voth,et al.  A multiscale coarse-graining method for biomolecular systems. , 2005, The journal of physical chemistry. B.

[11]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[12]  Yaoqi Zhou,et al.  Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions , 2008, Protein science : a publication of the Protein Society.

[13]  Gregory A Voth,et al.  The multiscale coarse-graining method. II. Numerical implementation for coarse-grained molecular models. , 2008, The Journal of chemical physics.

[14]  A. Sali,et al.  How well can the accuracy of comparative protein structure models be predicted? , 2008, Protein science : a publication of the Protein Society.

[15]  A. Atilgan,et al.  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. , 1997, Folding & design.

[16]  M. Betancourt,et al.  Another look at the conditions for the extraction of protein knowledge‐based potentials , 2009, Proteins.

[17]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[18]  David Baker,et al.  Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Andrzej Kloczkowski,et al.  GENN: a GEneral Neural Network for learning tabulated data with examples from protein structure prediction. , 2015, Methods in molecular biology.

[20]  Mallur S. Madhusudhan,et al.  DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins , 2011, Nucleic Acids Res..

[21]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[22]  A. Ben-Naim STATISTICAL POTENTIALS EXTRACTED FROM PROTEIN STRUCTURES : ARE THESE MEANINGFUL POTENTIALS? , 1997 .

[23]  R. Larson,et al.  The MARTINI Coarse-Grained Force Field: Extension to Proteins. , 2008, Journal of chemical theory and computation.

[24]  Nir Kalisman,et al.  Differentiable, multi‐dimensional, knowledge‐based energy terms for torsion angle probabilities and propensities , 2008, Proteins.

[25]  Adam Liwo,et al.  Exploring the parameter space of the coarse‐grained UNRES force field by random search: Selecting a transferable medium‐resolution force field , 2009, J. Comput. Chem..

[26]  A. Liwo,et al.  Cumulant-based expressions for the multibody terms for the correlation between local and electrostatic interactions in the united-residue force field , 2001 .

[27]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[28]  A. Liwo,et al.  Modification and optimization of the united-residue (UNRES) potential energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. , 2007, The journal of physical chemistry. B.

[29]  Sanzo Miyazawa,et al.  Prediction of Contact Residue Pairs Based on Co-Substitution between Sites in Protein Structures , 2013, PloS one.

[30]  Taner Z Sen,et al.  The Extent of Cooperativity of Protein Motions Observed with Elastic Network Models Is Similar for Atomic and Coarser-Grained Models. , 2006, Journal of chemical theory and computation.

[31]  Andrzej Kloczkowski,et al.  A Diffused-Constraint Theory for the Elasticity of Amorphous Polymer Networks. 1. Fundamentals and Stress-Strain Isotherms in Elongation , 1995 .

[32]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[33]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[34]  J Moult,et al.  Comparison of database potentials and molecular mechanics force fields. , 1997, Current opinion in structural biology.

[35]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[36]  D. Tieleman,et al.  The MARTINI force field: coarse grained model for biomolecular simulations. , 2007, The journal of physical chemistry. B.

[37]  Daniel Borgis,et al.  A coarse-grained protein-protein potential derived from an all-atom force field. , 2007, The journal of physical chemistry. B.

[38]  W G Noid,et al.  Recovering physical potentials from a model protein databank , 2010, Proceedings of the National Academy of Sciences.

[39]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[40]  Andrzej Kloczkowski,et al.  The origin and extent of coarse‐grained regularities in protein internal packing , 2003, Proteins.

[41]  Andrzej Kloczkowski,et al.  Four‐body contact potentials derived from two protein datasets to discriminate native structures from decoys , 2007, Proteins.

[42]  Sumudu P. Leelananda,et al.  Multibody coarse‐grained potentials for native structure recognition and quality assessment of protein models , 2011, Proteins.

[43]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[44]  M. Levitt,et al.  A lattice model for protein structure prediction at low resolution. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Michael Levitt,et al.  Probing protein fold space with a simplified model. , 2008, Journal of molecular biology.

[46]  A. Kolinski Protein modeling and structure prediction with a reduced representation. , 2004, Acta biochimica Polonica.

[47]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[48]  D. Baker,et al.  Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.