ProVal: A protein‐scoring function for the selection of native and near‐native folds

A low‐resolution scoring function for the selection of native and near‐native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The Cα RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate ∼90% of fold candidates generated by current prediction methods from all‐atom modeling and further evaluation. The correlation between the predicted and actual Cα RMS values varies considerably between the candidate fold sets. Proteins 2003;53:000–000. © 2003 Wiley‐Liss, Inc.

[1]  J Moult,et al.  Comparison of database potentials and molecular mechanics force fields. , 1997, Current opinion in structural biology.

[2]  Antônio F. Pereira de Araújo Folding protein models with a simple hydrophobic energy function: The fundamental importance of monomer inside/outside segregation , 1999 .

[3]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[4]  S. Wold Exponentially weighted moving principal components analysis and projections to latent structures , 1994 .

[5]  M Levitt,et al.  Recognizing native folds by the arrangement of hydrophobic and polar residues. , 1995, Journal of molecular biology.

[6]  M. Hao,et al.  Designing potential energy functions for protein folding. , 1999, Current opinion in structural biology.

[7]  Garland R. Marshall,et al.  VALIDATE: A New Method for the Receptor-Based Prediction of Binding Affinities of Novel Ligands , 1996 .

[8]  N. Linial,et al.  On the design and analysis of protein folding potentials , 2000, Proteins.

[9]  V A Eyrich,et al.  Prediction of protein tertiary structure to low resolution: performance for a large and structurally diverse test set. , 1999, Journal of molecular biology.

[10]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[11]  M. Levitt,et al.  Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations. , 1996, Journal of molecular biology.

[12]  B. Lee,et al.  Hydrophobic potential by pairwise surface area sum. , 1995, Protein engineering.

[13]  Jay W. Ponder,et al.  Protein structure prediction using a combination of sequence homology and global energy minimization: II. Energy functions , 1998, J. Comput. Chem..

[14]  R L Jernigan,et al.  Short‐range conformational energies, secondary structure propensities, and recognition of correct sequence‐structure matches , 1997, Proteins.

[15]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[16]  A Sali,et al.  Comparative protein modeling by satisfaction of spatial restraints. , 1996, Molecular medicine today.

[17]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[18]  J L Sussman,et al.  Protein Data Bank archives of three-dimensional macromolecular structures. , 1997, Methods in enzymology.

[19]  G E Kellogg,et al.  Allosteric modifiers of hemoglobin. 2. Crystallographically determined binding sites and hydrophobic binding/interaction analysis of novel hemoglobin oxygen effectors. , 1991, Journal of medicinal chemistry.

[20]  Kenneth M. Merz,et al.  Rapid approximation to molecular surface area via the use of Boolean logic and look‐up tables , 1993, J. Comput. Chem..

[21]  B. Roux,et al.  Implicit solvent models. , 1999, Biophysical chemistry.

[22]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[23]  Anthony K. Felts,et al.  Protein tertiary structure prediction using a branch and bound algorithm , 1999, Proteins.

[24]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[25]  Garland R. Marshall,et al.  A potential smoothing algorithm accurately predicts transmembrane helix packing , 1999, Nature Structural Biology.

[26]  Werner Braun,et al.  Minimization of empirical energy functions in proteins including hydrophobic surface area effects , 1993, J. Comput. Chem..

[27]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[28]  A F Pereira De Araújo Folding protein models with a simple hydrophobic energy function: the fundamental importance of monomer inside/outside segregation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Garland R. Marshall,et al.  Properties of intraglobular contacts in proteins: an approach to prediction of tertiary structure , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[30]  R. Jernigan,et al.  Structure-derived potentials and protein simulations. , 1996, Current opinion in structural biology.

[31]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[32]  David C. Jones,et al.  Potential energy functions for threading. , 1996, Current opinion in structural biology.

[33]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[34]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[35]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[36]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[37]  G R Marshall,et al.  Ab initio modeling of small, medium, and large loops in proteins. , 2001, Biopolymers.

[38]  Glen Eugene Kellogg,et al.  HINT: A new method of empirical hydrophobic field calculation for CoMFA , 1991, J. Comput. Aided Mol. Des..

[39]  S Vajda,et al.  Empirical potentials and functions for protein folding and binding. , 1997, Current opinion in structural biology.

[40]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.