Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines

Assessing the quality of a protein structure model is essential for protein structure prediction. Here, we developed a Support Vector Machine (SVM) method to predict the quality score (GDT-TS score) of a protein structure model from the features extracted from the sequence alignment used to generate the model. We developed a Support Vector Machine (SVM) model quality assessment method, taking either a query-single-template pairwise alignment or a query-multitemplate alignment as input. For the pairwise alignment scheme, the input features fed into the SVM predictor include the normalized e-value of the given alignment, the percentage of identical residue pairs in the alignment, the percentage of residues of the query aligned with those of the template, and the sum of the BLOSUM scores of all aligned residues divided by the length of the aligned positions. Similarly, for the multiple-alignment scheme, the input features include the percentage of the residues of the target sequence aligned with those in one or more templates, the percentage of aligned residues of the target sequence that are the same as that of any one template, the average BLOSUM score of aligned residues and the average Gonnet160 score of aligned residues. A SVM regression predictor was trained on the training data to predict the GDT-TS scores of the models from the input features. The Root Mean Square Error (RMSE) and the Absolute Mean Error (ABS) between predicted and real GDT-TS scores were calculated to evaluate the performance. A five-fold cross validation was applied to select the best parameter values based on the average RMSE and ABS on the five folds. The RMSE and ABS of the optimized SVM predictor on the testing data were close to 0.1. The good performance of the SVM and sequence alignment based predictor indicates that integrating sequence alignment features with a SVM is effective for protein model quality assessment.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[3]  Jilong Li,et al.  Designing and benchmarking the MULTICOM protein structure prediction system , 2013, BMC Structural Biology.

[4]  J. Chou,et al.  Structure and mechanism of the M2 proton channel of influenza A virus , 2008, Nature.

[5]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[6]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[7]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[9]  B Contreras-Moreira,et al.  Empirical limits for template‐based protein structure prediction: the CASP5 example , 2005, FEBS letters.

[10]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[11]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[12]  Daisuke Kihara,et al.  Estimating quality of template‐based protein models by alignment stability , 2007, Proteins.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  S. Harrison,et al.  Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching , 2011, Nature.

[15]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[16]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[17]  Yaoqi Zhou,et al.  SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. , 2005, Bioinformatics.

[18]  J. Chou,et al.  Unusual architecture of the p7 channel from hepatitis C virus , 2013, Nature.

[19]  Bing Sun,et al.  Unusual architecture of the p7 channel from hepatitis C virus , 2013, Nature.

[20]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[21]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[22]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[23]  Xin Deng,et al.  The MULTICOM toolbox for protein structure prediction , 2012, BMC Bioinformatics.

[24]  Kuo-Chen Chou,et al.  Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding protein. , 2005, Journal of proteome research.

[25]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .