Progress in super long loop prediction

Sampling errors are very common in super long loop (referring here to loops that have more than thirteen residues) prediction, simply because the sampling space is vast. We have developed a dipeptide segment sampling algorithm to solve this problem. As a first step in evaluating the performance of this algorithm, it was applied to the problem of reconstructing loops in native protein structures. With a newly constructed test set of 89 loops ranging from 14 to 17 residues, this method obtains average/median global backbone root‐mean‐square deviations (RMSDs) to the native structure (superimposing the body of the protein, not the loop itself) of 1.46/0.68 Å. Specifically, results for loops of various lengths are 1.19/0.67 Å for 36 fourteen‐residue loops, 1.55/0.75 Å for 30 fifteen‐residue loops, 1.43/0.80 Å for 14 sixteen‐residue loops, and 2.30/1.92 Å for nine seventeen‐residue loops. In the vast majority of cases, the method locates energy minima that are lower than or equal to that of the minimized native loop, thus indicating that the new sampling method is successful and rarely limits prediction accuracy. Median RMSDs are substantially lower than the averages because of a small number of outliers. The causes of these failures are examined in some detail, and some can be attributed to flaws in the energy function, such as π–π interactions are not accurately accounted for by the OPLS‐AA force field we employed in this study. By introducing a new energy model which has a superior description of π–π interactions, significantly better results were achieved for quite a few former outliers. Crystal packing is explicitly included in order to provide a fair comparison with crystal structures. Proteins 2011;. © 2011 Wiley‐Liss, Inc.

[1]  Z. Xiang,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001, Journal of molecular biology.

[2]  Jun Zhai,et al.  ArchPRED: a template based loop structure prediction server , 2006, Nucleic Acids Res..

[3]  Barry Honig,et al.  Loop modeling: Sampling, filtering, and scoring , 2007, Proteins.

[4]  R. Friesner,et al.  Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides† , 2001 .

[5]  George A. Kaminski,et al.  Force Field Validation Using Protein Side Chain Prediction , 2002 .

[6]  Cinque S. Soto,et al.  Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Kai Zhu,et al.  Toward better refinement of comparative models: Predicting loops in inexact environments , 2008, Proteins.

[8]  R. Friesner,et al.  Long loop prediction using the protein local optimization program , 2006, Proteins.

[9]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.

[10]  Robert Preissner,et al.  SuperLooper—a prediction server for the modeling of loops in globular and membrane proteins , 2009, Nucleic Acids Res..

[11]  Richard A Friesner,et al.  Prediction of Protein Loop Conformations using the AGBNP Implicit Solvent Model and Torsion Angle Sampling. , 2008, Journal of chemical theory and computation.

[12]  Lisa Yan,et al.  LOOPER: a molecular mechanics-based algorithm for protein loop prediction. , 2008, Protein engineering, design & selection : PEDS.

[13]  An-Suei Yang,et al.  Modeling protein loops with knowledge-based prediction of sequence-structure alignment , 2007, Bioinform..

[14]  Janusz M. Bujnicki,et al.  Prediction of protein structures, functions, and interactions , 2008 .

[15]  Ronald M. Levy,et al.  The SGB/NP hydration free energy model based on the surface generalized born solvent reaction field and novel nonpolar hydration free energy estimators , 2002, J. Comput. Chem..

[16]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[17]  Xin Li,et al.  Assignment of polar states for protein amino acid residues using an interaction cluster decomposition algorithm and its application to high resolution protein structure modeling , 2006, Proteins.

[18]  Z. Xiang,et al.  Prediction of side‐chain conformations on protein surfaces , 2007, Proteins.

[19]  Z. Xiang,et al.  On the role of the crystal environment in determining protein side-chain conformations. , 2002, Journal of molecular biology.

[20]  Nathan A. Baker,et al.  Electrostatics of nanosystems: Application to microtubules and the ribosome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[22]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[23]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[24]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[25]  A. Goede,et al.  Loops In Proteins (LIP)--a comprehensive loop database for homology modelling. , 2003, Protein engineering.

[26]  Kai Zhu,et al.  Improved Methods for Side Chain and Loop Predictions via the Protein Local Optimization Program:  Variable Dielectric Model for Implicitly Improving the Treatment of Polarization Effects. , 2007, Journal of chemical theory and computation.

[27]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[28]  R. Friesner,et al.  Generalized Born Model Based on a Surface Integral Formulation , 1998 .

[29]  R. Friesner,et al.  The VSGB 2.0 model: A next generation energy model for high resolution protein structure modeling , 2011, Proteins.

[30]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.