ProQ3: Improved model quality assessments using Rosetta energy terms

Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/.

[1]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[2]  Zheng Wang,et al.  Benchmarking Deep Networks for Predicting Residue-Specific Quality of Individual Protein Models in CASP11 , 2016, Scientific Reports.

[3]  Arne Elofsson,et al.  Identification of correct regions in protein models using structural, alignment, and consensus information , 2006, Protein science : a publication of the Protein Society.

[4]  D E Neves,et al.  Monte Carlo calculations on polypeptide chains. VIII. Distribution functions for the end-to-end distance and radius of gyration for hard-sphere models of randomly coiling poly(glycine) and poly(L-alanine). , 1975, Macromolecules.

[5]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[6]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[7]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[8]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[9]  Yaoqi Zhou,et al.  Specific interactions for ab initio folding of protein terminal regions with secondary structures , 2008, Proteins.

[10]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[11]  Renzhi Cao,et al.  Protein single-model quality assessment by feature-based probability density functions , 2016, Scientific Reports.

[12]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[13]  Arne Elofsson,et al.  Pcons5: combining consensus, structural evaluation and fold recognition scores , 2005, Bioinform..

[14]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[15]  Arne Elofsson,et al.  Automatic consensus‐based fold recognition using Pcons, ProQ, and Pmodeller , 2003, Proteins.

[16]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[17]  R Dustin Schaeffer,et al.  CASP 11 target classification , 2016, Proteins.

[18]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.

[19]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[20]  Arne Elofsson,et al.  Prediction of global and local model quality in CASP7 using Pcons and ProQ , 2007, Proteins.

[21]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[22]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[23]  T. Yeates,et al.  Verification of protein structures: Patterns of nonbonded atomic interactions , 1993, Protein science : a publication of the Protein Society.

[24]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[25]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[26]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[27]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[28]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[29]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[30]  Zheng Wang,et al.  Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment , 2014, BMC Structural Biology.

[31]  Arne Elofsson,et al.  A study of quality measures for protein threading models , 2001, BMC Bioinformatics.

[32]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[33]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[34]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[35]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[36]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[37]  Anna Tramontano,et al.  Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11 , 2016, Proteins.

[38]  Liam J. McGuffin,et al.  Improving sequence-based fold recognition by using 3D model quality assessment , 2005, Bioinform..

[39]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.