Hierarchical energy-based approach to protein-structure prediction: Blind-test evaluation with CASP3 targets

A hierarchical approach based exclusively on finding the global minimum of an appropriate potential energy function, without the aid of secondary structure prediction, multiple-sequence alignment, or threading, is proposed. The procedure starts from an extensive search of the conformational space of a protein, using our recently developed united-residue off-lattice UNRES force field and the conformational space annealing (CSA) method. The structures obtained in the search are clustered into families and ranked according to their UNRES energy. Structures within a preassigned energy cutoff are gradually converted into an all-atom representation, followed by a limited conformational search at the all-atom level, using the electrostatically driven Monte Carlo (EDMC) method and the ECEPP/3 force field including hydration. The approach was tested (in the CASP3 experiment) in blind predictions on seven targets, five of which were globular proteins with sizes ranging from 89 to 140 amino acid residues. Comparison of the computed lowest-energy structures, with the experimental structures, made available after the predictions were submitted, shows that large fragments (∼60 residues, representing 45–80% of the proteins) of those five globular proteins were predicted with the root mean square deviations (RMSDs) ranging from 4 to 7 A for the Cα atoms, with correct secondary structure and topology. These results constitute an important step toward the prediction of protein structure based solely on global optimization of a potential energy function for a given amino acid sequence. © 2000 John Wiley & Sons, Inc. Int J Quant Chem 77: 90–117, 2000

[1]  P K Warme,et al.  Computation of structures of homologous proteins. Alpha-lactalbumin from lysozyme. , 1974, Biochemistry.

[2]  M. Snow Powerful simulated‐annealing algorithm locates global minimum of protein‐folding potentials from multiple starting conformations , 1992 .

[3]  V J Hruby,et al.  Exploration of the conformational space of oxytocin and arginine-vasopressin using the electrostatically driven Monte Carlo and molecular dynamics methods. , 1998, Biopolymers.

[4]  H. Scheraga,et al.  Monte Carlo-minimization approach to the multiple-minima problem in protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[5]  G M Crippen Easily searched protein folding potentials. , 1996, Journal of molecular biology.

[6]  C. Orengo Classification of protein folds , 1994 .

[7]  H A Scheraga,et al.  Variable-target-function and build-up procedures for the calculation of protein conformation. Application to bovine pancreatic trypsin inhibitor using limited simulated nuclear magnetic resonance data. , 1988, Journal of biomolecular structure & dynamics.

[8]  John P. Overington,et al.  Alignment and searching for common protein folds using a data bank of structural templates. , 1993, Journal of molecular biology.

[9]  J. Skolnick,et al.  What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? , 1998, Folding & design.

[10]  H A Scheraga,et al.  Coupling between folding and ionization equilibria: effects of pH on the conformational preferences of polypeptides. , 1996, Journal of molecular biology.

[11]  Y Wang,et al.  A new procedure for constructing peptides into a given Calpha chain. , 1998, Folding & design.

[12]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[13]  J. Skolnick,et al.  Monte carlo simulations of protein folding. II. Application to protein A, ROP, and crambin , 1994, Proteins.

[14]  H. Scheraga,et al.  Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen bond interactions, and intrinsic torsional potentials for the naturally occurring amino acids , 1975 .

[15]  Lucjan Piela,et al.  On the multiple‐minima problem in the conformational analysis of polypeptides. V. Application of the self‐consistent electrostatic field and the electrostatically driven monte carlo methods to bovine pancreatic trypsin inhibitor , 1991, Proteins.

[16]  N Lotan,et al.  A molecular switch for biochemical logic gates: conformational studies. , 1997, Biosensors & bioelectronics.

[17]  C. Orengo,et al.  Analysis and assessment of ab initio three‐dimensional prediction, secondary structure, and contacts prediction , 1999, Proteins.

[18]  M. Levitt,et al.  Exploring conformational space with a simple lattice model for protein structure. , 1994, Journal of molecular biology.

[19]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[20]  H. Scheraga,et al.  Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP/3 algorithm, with application to proline-containing peptides , 1994 .

[21]  C. Chothia Structural invariants in protein folding , 1975, Nature.

[22]  New optimization method for conformational energy calculations on polypeptides: Conformational space annealing , 1997 .

[23]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[24]  A. Kolinski,et al.  Simulations of the Folding of a Globular Protein , 1990, Science.

[25]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[26]  Harold A. Scheraga,et al.  Conformational space annealing by parallel computations: Extensive conformational search of Met‐enkephalin and of the 20‐residue membrane‐bound portion of melittin , 1999 .

[27]  M. Overduin,et al.  Structure and Asn-Pro-Phe binding pocket of the Eps15 homology domain. , 1998, Science.

[28]  Adam Liwo,et al.  Comparison of the low energy conformations of an oncogenic and a non-oncogenic p21 protein, neither of which binds GTP or GDP , 1994, Journal of protein chemistry.

[29]  H A Scheraga,et al.  The Electrostatically Driven Monte Carlo method: Application to conformational analysis of decaglycine , 1991, Biopolymers.

[30]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[31]  H A Scheraga,et al.  Recent developments in the theory of protein folding: searching for the global energy minimum. , 1996, Biophysical chemistry.

[32]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[33]  S. Rackovsky,et al.  Prediction of protein conformation on the basis of a search for compact structures: Test on avian pancreatic polypeptide , 1993, Protein science : a publication of the Protein Society.

[34]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[35]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[36]  R. Bruccoleri,et al.  Application of a directed conformational search for generating 3‐D coordinates for protein structures from α‐carbon coordinates , 1992, Proteins.

[37]  H. Scheraga,et al.  Intermolecular potentials from crystal data. 6. Determination of empirical potentials for O-H...O = C hydrogen bonds from packing configurations , 1984 .

[38]  M. Hao,et al.  How optimization of potential functions affects protein folding. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[39]  L. Mirny,et al.  Protein structure prediction by threading. Why it works and why it does not. , 1998, Journal of molecular biology.

[40]  H. Scheraga,et al.  Conformational Energy Calculations on Polypeptides and Proteins , 1994 .

[41]  A potential function for conformational analysis of proteins. , 2009, International journal of peptide and protein research.

[42]  H. Scheraga,et al.  Application of the diffusion equation method for global optimization to oligopeptides , 1992 .

[43]  M J Rooman,et al.  Extracting information on folding from the amino acid sequence: consensus regions with preferred conformation in homologous proteins. , 1992, Biochemistry.

[44]  A. Liwo,et al.  A united‐residue force field for off‐lattice protein‐structure simulations. I. Functional forms and parameters of long‐range side‐chain interaction potentials from protein crystal data , 1997 .

[45]  A. Liwo,et al.  Energy-based de novo protein folding by conformational space annealing and an off-lattice united-residue force field: application to the 10-55 fragment of staphylococcal protein A and to apo calbindin D9K. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[46]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[47]  J. Skolnick,et al.  Monte carlo simulations of protein folding. I. Lattice model and interaction scheme , 1994, Proteins.

[48]  Robert G. Martin,et al.  A novel DNA-binding motif in MarA: the first structure for an AraC family transcriptional activator. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[50]  A. Liwo,et al.  Calculation of protein conformation by global optimization of a potential energy function , 1999, Proteins.

[51]  Adam Liwo,et al.  An Efficient Deformation-Based Global Optimization Method for Off-Lattice Polymer Chains: Self-Consistent Basin-to-Deformed-Basin Mapping (SCBDBM). Application to United-Residue Polypeptide Chains , 1999 .

[52]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[53]  H. Scheraga,et al.  Empirical solvation models can be used to differentiate native from near‐native conformations of bovine pancreatic trypsin inhibitor , 1991, Proteins.

[54]  P. Wolynes,et al.  Protein tertiary structure recognition using optimized Hamiltonians with local interactions. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[55]  H. Scheraga,et al.  Role of hydrophobicity and solvent-mediated charge-charge interactions in stabilizing alpha-helices. , 1998, Biophysical journal.

[56]  H A Scheraga,et al.  New developments of the electrostatically driven Monte Carlo method: test on the membrane-bound portion of melittin. , 1998, Biopolymers.

[57]  A. Liwo,et al.  Protein structure prediction by global optimization of a potential energy function. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Harold A. Scheraga,et al.  Structure and free energy of complex thermodynamic systems , 1988 .

[59]  Akbar Nayeem,et al.  MSEED: A program for the rapid analytical determination of accessible surface areas and their derivatives , 1992 .

[60]  N. Go,et al.  Ring Closure and Local Conformational Deformations of Chain Molecules , 1970 .

[61]  H. Scheraga,et al.  Proline‐induced constraints in α‐helices , 1987, Biopolymers.

[62]  L. Piela,et al.  Mean field theory as a tool for intramolecular conformational optimization. 1. Tests on terminally-blocked alanine and met-enkephalin , 1992 .

[63]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[64]  H A Scheraga,et al.  Conversion from a virtual‐bond chain to a complete polypeptide backbone chain , 1984, Biopolymers.

[65]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[66]  A. Liwo,et al.  United‐residue force field for off‐lattice protein‐structure simulations: III. Origin of backbone hydrogen‐bonding cooperativity in united‐residue potentials , 1998 .

[67]  Ranbir Singh,et al.  J. Mol. Struct. (Theochem) , 1996 .

[68]  S. Rackovsky,et al.  Conformational analysis of the 20-residue membrane-bound portion of melittin by conformational space annealing. , 1998, Biopolymers.

[69]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[70]  Harold A. Scheraga,et al.  Some approaches to the multiple‐minima problem in the calculation of polypeptide and protein structures , 1992 .

[71]  H. Scheraga,et al.  Energy parameters in polypeptides. 9. Updating of geometrical parameters, nonbonded interactions, and hydrogen bond interactions for the naturally occurring amino acids , 1983 .

[72]  H A Scheraga,et al.  On the multiple‐minima problem in the conformational analysis of polypeptides. I. Backbone degrees of freedom for a perturbed α‐helix , 1987 .

[73]  M. Karplus,et al.  How does a protein fold? , 1994, Nature.

[74]  A. Godzik,et al.  A general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins , 1993 .

[75]  D A Clark,et al.  Protein topology prediction through constraint-based search and the evaluation of topological folding rules. , 1991, Protein engineering.

[76]  Adam Liwo,et al.  Prediction of protein structure using a knowledge-based off-lattice united-residue force field and global optimization methods , 1999 .

[77]  H. Scheraga,et al.  On the multiple‐minima problem in the conformational analysis of polypeptides. II. An electrostatically driven Monte Carlo method—tests on poly(L‐alanine) , 1988, Biopolymers.

[78]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[79]  D Fischer,et al.  Assigning amino acid sequences to 3‐dimensional protein folds , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[80]  S. Rackovsky,et al.  Calculation of protein backbone geometry from α‐carbon coordinates based on peptide‐group dipole alignment , 1993, Protein science : a publication of the Protein Society.

[81]  H A Scheraga,et al.  Improved genetic algorithm for the protein folding problem by use of a Cartesian combination operator , 1996, Protein science : a publication of the Protein Society.

[82]  H. Scheraga,et al.  On the multiple-minima problem in the conformational analysis of molecules: deformation of the potential energy hypersurface by the diffusion equation method , 1989 .

[83]  M. Hao,et al.  Effects of compact volume and chain stiffness on the conformations of native proteins. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[84]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[85]  M GayDavid,et al.  Algorithm 611: Subroutines for Unconstrained Minimization Using a Model/Trust-Region Approach , 1983 .

[86]  Harold A. Scheraga,et al.  An approximate treatment of long-range interactions in proteins , 1977 .

[87]  Harold A. Scheraga,et al.  Standard‐geometry chains fitted to X‐ray derived structures: Validation of the rigid‐geometry approximation. I. Chain closure through a limited search of “loop” conformations , 1991 .

[88]  Harold A. Scheraga,et al.  Computation of the Structure-Dependent pKa Shifts in a Polypentapeptide of the Poly[fv(IPGVG), fE(IPGEG)] Family , 1998 .

[89]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[90]  J. Kostrowicki,et al.  Diffusion equation method of global minimization: Performance for standard test functions , 1991 .

[91]  Chris Sander,et al.  Protein folds and families: sequence and structure alignments , 1999, Nucleic Acids Res..

[92]  A. Liwo,et al.  A united‐residue force field for off‐lattice protein‐structure simulations. II. Parameterization of short‐range interactions and determination of weights of energy terms by Z‐score optimization , 1997 .

[93]  L. Pauling,et al.  Molecules in natural science and medicine : an encomium for Linus Pauling , 1991 .

[94]  M. Hao,et al.  STATISTICAL THERMODYNAMICS OF PROTEIN FOLDING : SEQUENCE DEPENDENCE , 1994 .

[95]  Jeffrey Skolnick,et al.  Efficient algorithm for the reconstruction of a protein backbone from the α‐carbon coordinates , 1992 .

[96]  Harold A. Scheraga,et al.  The multiple-minima problem in the conformational analysis of polypeptides. III. An Electrostatically Driven Monte Carlo Method: Tests on enkephalin , 1989, Journal of protein chemistry.

[97]  Fan Yang,et al.  Crystal structure of Escherichia coli HdeA , 1998, Nature Structural Biology.

[98]  Adam Godzik,et al.  De novo and inverse folding predictions of protein structure and dynamics , 1993, J. Comput. Aided Mol. Des..