Protein side-chain packing problem: is there still room for improvement?

The protein side-chain packing problem (PSCPP) is an important subproblem of both protein structure prediction and protein design. During the past two decades, a large number of methods have been proposed to tackle this problem. These methods consist of three main components: a rotamer library, a scoring function and a search strategy. The average overall accuracy level obtained by these methods is approximately 87%. Whether a better accuracy level could be achieved remains to be answered. To address this question, we calculated the maximum accuracy level attainable using a simple rotamer library, independently of the energy function or the search method. Using 2883 different structures from the Protein Data Bank, we compared this accuracy level with the accuracy level of five state-of-the-art methods. These comparisons indicated that, for buried residues in the protein, we are already close to the best possible accuracy results. In addition, for exposed residues, we found that a significant gap exists between the possible improvement and the maximum accuracy level achievable with current methods. After determining that an improvement is possible, the next step is to understand what limitations are preventing us from obtaining such an improvement. Previous works on protein structure prediction and protein design have shown that scoring function inaccuracies may represent the main obstacle to achieving better results for these problems. To show that the same is true for the PSCPP, we evaluated the quality of two scoring functions used by some state-of-the-art algorithms. Our results indicate that neither of these scoring functions can guide the search method correctly, thereby reinforcing the idea that efforts to solve the PSCPP must also focus on developing better scoring functions.

[1]  J Andrew McCammon,et al.  Configurational‐bias sampling technique for predicting side‐chain conformations in proteins , 2006, Protein science : a publication of the Protein Society.

[2]  Z. Xiang,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001, Journal of molecular biology.

[3]  M Karplus,et al.  Protein sidechain conformer prediction: a test of the energy function. , 1998, Folding & design.

[4]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[5]  Pierre Baldi,et al.  SIDEpro: A novel machine learning approach for the fast and accurate prediction of side‐chain conformations , 2012, Proteins.

[6]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[7]  Jianpeng Ma,et al.  OPUS‐Rota: A fast and accurate method for side‐chain modeling , 2008, Protein science : a publication of the Protein Society.

[8]  Carlos A. Brizuela,et al.  An Experimental Analysis of the Performance of SideChain Packing Algorithms , 2015, GECCO.

[9]  Thomas Simonson,et al.  Protein side chain conformation predictions with an MMGBSA energy function , 2016, Proteins.

[10]  Yang Cao,et al.  Improved side-chain modeling by coupling clash-detection guided iterative search with rotamer relaxation , 2011, Bioinform..

[11]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[12]  A Joshua Wand,et al.  Improved side‐chain prediction accuracy using an ab initio potential energy function and a very large rotamer library , 2004, Protein science : a publication of the Protein Society.

[13]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[14]  Yaoqi Zhou,et al.  Energy functions in de novo protein design: current challenges and future prospects. , 2013, Annual review of biophysics.

[15]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[16]  Rhiju Das,et al.  Four Small Puzzles That Rosetta Doesn't Solve , 2011, PloS one.

[17]  Patrice Koehl,et al.  Protein side‐chain modeling with a protein‐dependent optimized rotamer library , 2014, Proteins.

[18]  J. Selbig,et al.  Knowledge-based prediction of protein structures. , 1990, Journal of theoretical biology.

[19]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[20]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[21]  Qiang Lv,et al.  Improved packing of protein side chains with parallel ant colonies , 2014, BMC Bioinformatics.

[22]  Peter A. Kollman,et al.  AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules , 1995 .

[23]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[24]  Joshua D. Knowles,et al.  Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction , 2009, Bioinform..

[25]  Wei Zhang,et al.  Grow to Fit Molecular Dynamics (G2FMD): an ab initio method for protein side-chain assignment and refinement. , 2006, Protein engineering, design & selection : PEDS.

[26]  P Argos,et al.  Correlation between side chain mobility and conformation in protein structures. , 1997, Protein engineering.

[27]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[28]  D B Gordon,et al.  Branch-and-terminate: a combinatorial optimization algorithm for protein design. , 1999, Structure.

[29]  Tirso Pons,et al.  Homology modeling, model and software evaluation: three related resources , 1998, Bioinform..

[30]  R. Samudrala,et al.  Determinants of side chain conformational preferences in protein structures. , 1998, Protein engineering.

[31]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[32]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[33]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[34]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[35]  Deok-Soo Kim,et al.  BetaSCPWeb: side-chain prediction for protein structures using Voronoi diagrams and geometry prioritization , 2016, Nucleic Acids Res..

[36]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[37]  Rolf Backofen,et al.  COMPUTATIONAL MOLECULAR BIOLOGY: AN INTRODUCTION , 2000 .

[38]  Chi Zhang,et al.  Fast and accurate prediction of protein side-chain conformations , 2011, Bioinform..

[39]  Yang Cao,et al.  RASP: rapid modeling of protein side chain conformations , 2011, Bioinform..

[40]  Jianpeng Ma,et al.  OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. , 2008, Journal of molecular biology.

[41]  Lenna X. Peterson,et al.  Assessment of protein side‐chain conformation prediction methods in different residue environments , 2014, Proteins.