Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge‐based potentials, and clustering

In this article, we present an iterative, modular optimization (IMO) protocol for the local structure refinement of protein segments containing secondary structure elements (SSEs). The protocol is based on three modules: a torsion‐space local sampling algorithm, a knowledge‐based potential, and a conformational clustering algorithm. Alternative methods are tested for each module in the protocol. For each segment, random initial conformations were constructed by perturbing the native dihedral angles of loops (and SSEs) of the segment to be refined while keeping the protein body fixed. Two refinement procedures based on molecular mechanics force fields — using either energy minimization or molecular dynamics — were also tested but were found to be less successful than the IMO protocol. We found that DFIRE is a particularly effective knowledge‐based potential and that clustering algorithms that are biased by the DFIRE energies improve the overall results. Results were further improved by adding an energy minimization step to the conformations generated with the IMO procedure, suggesting that hybrid strategies that combine both knowledge‐based and physical effective energy functions may prove to be particularly effective in future applications. Proteins 2006. © 2006 Wiley‐Liss, Inc.

[1]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[2]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[3]  M. Vásquez,et al.  Modeling side-chain conformation. , 1996, Current opinion in structural biology.

[4]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[5]  Hui Lu,et al.  Application of statistical potentials to protein structure refinement from low resolution ab initio models , 2003, Biopolymers.

[6]  Thierry Siméon,et al.  Geometric algorithms for the conformational analysis of long protein loops , 2004, J. Comput. Chem..

[7]  M. Karplus,et al.  PDB-based protein loop prediction: parameters for selection and methods for optimization. , 1997, Journal of molecular biology.

[8]  Jan Hermans,et al.  Discrimination between native and intentionally misfolded conformations of proteins: ES/IS, a new method for calculating conformational free energy that uses both dynamics simulations with an explicit solvent and an implicit solvent continuum model , 1998, Proteins.

[9]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.

[10]  Anthony K. Felts,et al.  Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all‐atom force field and the surface generalized born solvent model , 2002, Proteins.

[11]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[12]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[13]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[14]  M. Nilges,et al.  Refinement of protein structures in explicit solvent , 2003, Proteins.

[15]  Eran Eyal,et al.  Importance of solvent accessibility and contact surfaces in modeling side‐chain conformations in proteins , 2004, J. Comput. Chem..

[16]  P. Kollman,et al.  Encyclopedia of computational chemistry , 1998 .

[17]  Rafael Najmanovich,et al.  Protein side‐chain rearrangement in regions of point mutations , 2002, Proteins.

[18]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[19]  Alfonso Valencia,et al.  Protein Refinement: A New Challenge For Casp In Its 10th Anniversary , 2005, Bioinform..

[20]  R A Friesner,et al.  Prediction of loop geometries using a generalized born model of solvation effects , 1999, Proteins.

[21]  P. Kollman,et al.  Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models , 1992 .

[22]  H. Berendsen,et al.  Molecular dynamics with coupling to an external bath , 1984 .

[23]  J Moult,et al.  Comparison of database potentials and molecular mechanics force fields. , 1997, Current opinion in structural biology.

[24]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Barry Honig,et al.  Comparative study of generalized Born models: protein dynamics. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[27]  Michael J. Cahill,et al.  On the kinematics of protein folding , 2002, J. Comput. Chem..

[28]  Cinque S. Soto,et al.  Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. Baker,et al.  Molecular dynamics in the endgame of protein structure prediction. , 2001, Journal of molecular biology.

[30]  R. Friesner,et al.  Generalized Born Model Based on a Surface Integral Formulation , 1998 .

[31]  Michael Feig,et al.  Evaluating CASP4 predictions with physical energy functions , 2002, Proteins.

[32]  S Vajda,et al.  Empirical potentials and functions for protein folding and binding. , 1997, Current opinion in structural biology.

[33]  George A. Kaminski,et al.  Force Field Validation Using Protein Side Chain Prediction , 2002 .

[34]  J. Garnier,et al.  Modeling of protein loops by simulated annealing , 1993, Protein science : a publication of the Protein Society.

[35]  R. Friesner,et al.  High‐resolution prediction of protein helix positions and orientations , 2004, Proteins.

[36]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[37]  G Vriend,et al.  Completion and refinement of 3‐D homology models with restricted molecular dynamics: Application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis , 2002, Proteins.

[38]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[39]  A. Fiser Protein structure modeling in the proteomics era , 2004, Expert review of proteomics.

[40]  B. Honig,et al.  On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles. , 2003, Journal of molecular biology.

[41]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[42]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[43]  B Honig,et al.  Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[44]  R. Jernigan,et al.  Structure-derived potentials and protein simulations. , 1996, Current opinion in structural biology.

[45]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[46]  T. Darden,et al.  A smooth particle mesh Ewald method , 1995 .

[47]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[48]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[49]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997, J. Comput. Chem..

[50]  W. L. Jorgensen,et al.  The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. , 1988, Journal of the American Chemical Society.

[51]  Hao Fan,et al.  Refinement of homology‐based protein structures by molecular dynamics simulation techniques , 2004, Protein science : a publication of the Protein Society.

[52]  Ronald M. Levy,et al.  AGBNP: An analytic implicit solvent model suitable for molecular dynamics simulations and high‐resolution modeling , 2004, J. Comput. Chem..

[53]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[54]  Z. Xiang,et al.  On the role of the crystal environment in determining protein side-chain conformations. , 2002, Journal of molecular biology.

[55]  B. Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[56]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[57]  Leszek Rychlewski,et al.  Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[58]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[59]  Z. Xiang,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001, Journal of molecular biology.

[60]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[61]  Sheldon Park,et al.  Advances in computational protein design. , 2004, Current opinion in structural biology.

[62]  M. Zalis,et al.  Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. , 1999, Journal of molecular biology.

[63]  Emil Alexov,et al.  Comparative study of generalized born models: Born radii and peptide folding. , 2005, The journal of physical chemistry. B.

[64]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[65]  C. Levinthal,et al.  Predicting antibody hypervariable loop conformation. I. Ensembles of random conformations for ringlike structures , 1987, Biopolymers.

[66]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[67]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[68]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[69]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[70]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[71]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[72]  V S Pande,et al.  Molecular dynamics simulations of unfolding and refolding of a beta-hairpin fragment of protein G. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Song Liu,et al.  Accurate and efficient loop selections by the DFIRE‐based all‐atom statistical potential , 2004, Protein science : a publication of the Protein Society.

[74]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[75]  W. Pearson,et al.  The limits of protein sequence comparison? , 2005, Current opinion in structural biology.

[76]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[77]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[78]  H. Berendsen,et al.  Interaction Models for Water in Relation to Protein Hydration , 1981 .

[79]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[80]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[81]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.