SVMQA: support‐vector‐machine‐based protein single‐model quality assessment

Motivation: The accurate ranking of predicted structural models and selecting the best model from a given candidate pool remain as open problems in the field of structural bioinformatics. The quality assessment (QA) methods used to address these problems can be grouped into two categories: consensus methods and single‐model methods. Consensus methods in general perform better and attain higher correlation between predicted and true quality measures. However, these methods frequently fail to generate proper quality scores for native‐like structures which are distinct from the rest of the pool. Conversely, single‐model methods do not suffer from this drawback and are better suited for real‐life applications where many models from various sources may not be readily available. Results: In this study, we developed a support‐vector‐machine‐based single‐model global quality assessment (SVMQA) method. For a given protein model, the SVMQA method predicts TM‐score and GDT_TS score based on a feature vector containing statistical potential energy terms and consistency‐based terms between the actual structural features (extracted from the three‐dimensional coordinates) and predicted values (from primary sequence). We trained SVMQA using CASP8, CASP9 and CASP10 targets and determined the machine parameters by 10‐fold cross‐validation. We evaluated the performance of our SVMQA method on various benchmarking datasets. Results show that SVMQA outperformed the existing best single‐model QA methods both in ranking provided protein models and in selecting the best model from the pool. According to the CASP12 assessment, SVMQA was the best method in selecting good‐quality models from decoys in terms of GDTloss. Availability and implementation: SVMQA method can be freely downloaded from http://lee.kias.re.kr/SVMQA/SVMQA_eval.tar.gz. Contact: jlee@kias.re.kr Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Anna Tramontano,et al.  Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11 , 2016, Proteins.

[2]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[3]  Daniel J. Rigden,et al.  From Protein Structure to Function with Bioinformatics , 2009 .

[4]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[5]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[6]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[7]  Keehyoung Joo,et al.  Template based protein structure modeling by global optimization in CASP11 , 2016, Proteins.

[8]  Keehyoung Joo,et al.  Template‐free modeling by LEE and LEER in CASP11 , 2016, Proteins.

[9]  Renzhi Cao,et al.  Protein single-model quality assessment by feature-based probability density functions , 2016, Scientific Reports.

[10]  Miao Sun,et al.  QAcon: single model quality assessment using protein structural and contact information with machine learning techniques , 2016, Bioinform..

[11]  Ka-Chun Wong Computational Biology and Bioinformatics: Gene Regulation , 2018 .

[12]  Keehyoung Joo,et al.  Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest , 2015, BMC Bioinformatics.

[13]  Keehyoung Joo,et al.  Protein structure modeling for CASP10 by multiple layers of global optimization , 2014, Proteins.

[14]  Liam J. McGuffin,et al.  The ModFOLD server for the quality assessment of protein structural models , 2008, Bioinform..

[15]  Daisuke Kihara,et al.  Quality assessment of protein structure models. , 2009, Current protein & peptide science.

[16]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[17]  Liam J. McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server , 2009, Proteins.

[18]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[19]  Yaoqi Zhou,et al.  Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions , 2008, Protein science : a publication of the Protein Society.

[20]  Yaoqi Zhou,et al.  Specific interactions for ab initio folding of protein terminal regions with secondary structures , 2008, Proteins.

[21]  A. Sali,et al.  Comparative protein structure modeling by iterative alignment, model building and model assessment. , 2003, Nucleic Acids Research.

[22]  Ruqian Lu,et al.  Sorting protein decoys by machine-learning-to-rank , 2016, Scientific Reports.

[23]  Daniel B. Roche,et al.  Assessing the quality of modelled 3D protein structures using the ModFOLD server. , 2014, Methods in molecular biology.

[24]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[25]  Marcin J. Skwark,et al.  PconsD: ultra rapid, accurate model quality assessment for protein structure prediction , 2013, Bioinform..

[26]  Yang Zhang,et al.  3DRobot: automated generation of diverse and well-packed protein structure decoys , 2016, Bioinform..

[27]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[28]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[29]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[30]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.

[31]  Keehyoung Joo,et al.  Contact‐assisted protein structure modeling by global optimization in CASP11 , 2016, Proteins.

[32]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[33]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[34]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[35]  Jooyoung Lee,et al.  Hidden Information Revealed by Optimal Community Structure from a Protein-Complex Bipartite Network Improves Protein Function Prediction , 2013, PloS one.

[36]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[37]  Jianpeng Ma,et al.  OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. , 2008, Journal of molecular biology.

[38]  Zheng Wang,et al.  Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment , 2014, BMC Structural Biology.