Protein model quality assessment by learning-to-rank

Protein structures are essential to understand the function. The predicted models have a broad range of the accuracy. Reliable estimates of the model quality are critical in determining the usefulness of the model to address a specific problem. In this study, a novel method has been presented to rank the models by their relative qualities. The proposed method first extracts various features from the three dimensional structures of proteins and then the learning-to-rank algorithm is used to rank the models based on their similarities with the native structures. Furthermore, a quasi single-model method is presented, which uses the top five identified models as references and ranks the other models by the average similarity with the reference models. Benchmark test is performed on a newly developed, template-based decoy generators which covers all the main structure classes of proteins. The proposed learning-to-rank method achieves an average Pearson correlation coefficient of 0.94 and a AUC value of 0.97, which consistently outperform all other well-developed methods. The quasi single-model can further improves the performance and achieve nearly perfect results with both PCC and AUC value of 0.99. The results demonstrate that the proposed method is an effective methodology for model quality assessment and provides the state-of-the-art performance.

[1]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[2]  Dong Xu,et al.  DL-PRO: A novel deep learning method for protein model quality assessment , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[3]  Jeffrey Skolnick,et al.  Protein Structure Prediction , 2007 .

[4]  Jilong Li,et al.  Large-scale model quality assessment for improving protein tertiary structure prediction , 2015, Bioinform..

[5]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[6]  Shuigeng Zhou,et al.  Novel Nonlinear Knowledge-Based Mean Force Potentials Based on Machine Learning , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Dong Xu,et al.  Protein Structural Model Selection by Combining Consensus and Single Scoring Methods , 2013, PloS one.

[8]  Marcin J. Skwark,et al.  PconsD: ultra rapid, accurate model quality assessment for protein structure prediction , 2013, Bioinform..

[9]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[10]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[11]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[12]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[13]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[14]  Liam J. McGuffin,et al.  The ModFOLD4 server for the quality assessment of 3D protein models , 2013, Nucleic Acids Res..

[15]  R. Adamczak,et al.  On the transferability of folding and threading potentials and sequence-independent filters for protein folding simulations , 2004 .

[16]  Jimmy Xiangji Huang,et al.  Learning to rank diversified results for biomedical information retrieval from multiple features , 2014, Biomedical engineering online.

[17]  Qingguo Wang,et al.  New MDS and Clustering Based Algorithms for protein Model Quality Assessment and Selection , 2013, Int. J. Artif. Intell. Tools.

[18]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[19]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2007, Current protocols in protein science.

[20]  Pierre Baldi,et al.  SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs , 2008, BMC Structural Biology.

[21]  Yang Zhang Protein structure prediction: when is it useful? , 2009, Current opinion in structural biology.

[22]  Daniel B. Roche,et al.  Assessing the quality of modelled 3D protein structures using the ModFOLD server. , 2014, Methods in molecular biology.

[23]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[24]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[25]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using Modeller , 2006, Current protocols in bioinformatics.

[26]  Marco Biasini,et al.  Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane) , 2014, Bioinform..

[27]  Janusz M. Bujnicki,et al.  QA-RecombineIt: a server for quality assessment and recombination of protein models , 2013, Nucleic Acids Res..

[28]  Zheng Wang,et al.  Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment , 2014, BMC Structural Biology.

[29]  B. Rao,et al.  Protein structure quality assessment based on the distance profiles of consecutive backbone Cα atoms , 2013, F1000Research.

[30]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[31]  Saraswathi Vishveshwara,et al.  Ranking the quality of protein structure models using sidechain based network properties , 2014, F1000Research.

[32]  Daisuke Kihara,et al.  Detecting local residue environment similarity for recognizing near‐native structure models , 2014, Proteins.

[33]  Andrzej Kloczkowski,et al.  MQAPsingle: A quasi single‐model approach for estimation of the quality of individual protein structure models , 2016, Proteins.

[34]  Alexander G. Gray,et al.  Learning Protein Folding Energy Functions , 2011, 2011 IEEE 11th International Conference on Data Mining.

[35]  Anna Tramontano,et al.  Assessment of predictions in the model quality assessment category , 2007, Proteins.

[36]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[37]  Esteban Lanzarotti,et al.  BeEP Server: using evolutionary information for quality assessment of protein structure models , 2013, Nucleic Acids Res..

[38]  Silvio C. E. Tosatto,et al.  The Victor/FRST Function for Model Quality Estimation , 2005, J. Comput. Biol..

[39]  Genki Terashi,et al.  Quality assessment methods for 3D protein structure models based on a residue-residue distance matrix prediction. , 2014, Chemical & pharmaceutical bulletin.

[40]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[41]  Shina Caroline Lynn Kamerlin,et al.  The electrostatic profile of consecutive Cβ atoms applied to protein structure quality assessment , 2013, F1000Research.

[42]  Kevin Karplus,et al.  Model quality assessment using distance constraints from alignments , 2009, Proteins.

[43]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[44]  Hongfei Lin,et al.  Learning to rank-based gene summary extraction , 2014, BMC Bioinformatics.