SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

BackgroundIt is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models.ResultsWe developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark.ConclusionSMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.

[1]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[2]  R Abagyan,et al.  Energy strain in three-dimensional protein structures. , 1998, Folding & design.

[3]  BMC Bioinformatics , 2005 .

[4]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[5]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[6]  Kevin Karplus,et al.  Applying undertaker cost functions to model quality assessment , 2009, Proteins.

[7]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[8]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[9]  Krzysztof Fidelis,et al.  Progress from CASP6 to CASP7 , 2007, Proteins.

[10]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[11]  Krzysztof Fidelis,et al.  Protein structure prediction center in CASP8 , 2009, Proteins.

[12]  Liam J. McGuffin,et al.  The ModFOLD server for the quality assessment of protein structural models , 2008, Bioinform..

[13]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[14]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[15]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[16]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Liam J. McGuffin,et al.  Benchmarking consensus model quality assessment for protein fold recognition , 2007, BMC Bioinformatics.

[18]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[19]  Arne Elofsson,et al.  Prediction of global and local model quality in CASP7 using Pcons and ProQ , 2007, Proteins.

[20]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[21]  Pascal Benkert,et al.  QMEAN server for protein model quality estimation , 2009, Nucleic Acids Res..

[22]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[23]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[24]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[25]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[26]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[27]  E. Lattman,et al.  The state of the Protein Structure Initiative , 2004, Proteins.

[28]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[29]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[30]  Sung-Hou Kim,et al.  A method for evaluating the structural quality of protein models by using higher-order φ–ψ pairs scoring , 2006 .

[31]  Vladislav Yu Orekhov,et al.  Removal of a time barrier for high-resolution multidimensional NMR spectroscopy , 2006, Nature Methods.

[32]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.

[33]  F. Melo,et al.  Assessing protein structures with a non-local atomic interaction energy. , 1998, Journal of molecular biology.

[34]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[35]  Barry Honig,et al.  Local quality assessment in homology models using statistical potentials and support vector machines , 2007, Protein science : a publication of the Protein Society.

[36]  Kevin Karplus,et al.  Model quality assessment using distance constraints from alignments , 2009, Proteins.

[37]  Anna Tramontano,et al.  Evaluation of CASP8 model quality predictions , 2009, Proteins.

[38]  Arne Elofsson,et al.  Identification of correct regions in protein models using structural, alignment, and consensus information , 2006, Protein science : a publication of the Protein Society.

[39]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[40]  Anna Tramontano,et al.  Evaluation of model quality predictions in CASP9 , 2011, Proteins.

[41]  T. Schwede,et al.  QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information , 2009, BMC Structural Biology.

[42]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[43]  T. Yeates,et al.  Verification of protein structures: Patterns of nonbonded atomic interactions , 1993, Protein science : a publication of the Protein Society.

[44]  Liam J. McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server , 2009, Proteins.