MQAPRank: improved global protein model quality assessment by learning-to-rank

BackgroundProtein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem.ResultsHere, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances.ConclusionsThe MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.

[1]  C A Floudas,et al.  Distance dependent centroid to centroid force fields using high resolution decoys , 2008, Proteins.

[2]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[3]  Shuigeng Zhou,et al.  Novel Nonlinear Knowledge-Based Mean Force Potentials Based on Machine Learning , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[5]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[6]  Anna Tramontano,et al.  Evaluation of model quality predictions in CASP9 , 2011, Proteins.

[7]  Daniel B. Roche,et al.  Assessing the quality of modelled 3D protein structures using the ModFOLD server. , 2014, Methods in molecular biology.

[8]  Qiaojun Fang,et al.  Protein refolding in silico with atom-based statistical potentials and conformational search using a simple genetic algorithm. , 2006, Journal of molecular biology.

[9]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2016, Current protocols in bioinformatics.

[10]  Pierre Baldi,et al.  SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs , 2008, BMC Structural Biology.

[11]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[12]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[13]  Andrej Sali,et al.  Comparative Protein Structure Modeling Using MODELLER , 2014, Current protocols in bioinformatics.

[14]  Liam J. McGuffin,et al.  The ModFOLD server for the quality assessment of protein structural models , 2008, Bioinform..

[15]  R. Adamczak,et al.  On the transferability of folding and threading potentials and sequence-independent filters for protein folding simulations , 2004 .

[16]  Yang Zhang,et al.  3DRobot: automated generation of diverse and well-packed protein structure decoys , 2016, Bioinform..

[17]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[18]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[19]  Jian Qiu,et al.  Atomically detailed potentials to recognize native and approximate protein structures , 2005, Proteins.

[20]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[21]  Silvio C. E. Tosatto,et al.  The Victor/FRST Function for Model Quality Estimation , 2005, J. Comput. Biol..

[22]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[23]  Andrzej Kloczkowski,et al.  MQAPsingle: A quasi single‐model approach for estimation of the quality of individual protein structure models , 2016, Proteins.

[24]  Dong Xu,et al.  Protein Structural Model Selection by Combining Consensus and Single Scoring Methods , 2013, PloS one.

[25]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[26]  Qingguo Wang,et al.  New MDS and Clustering Based Algorithms for protein Model Quality Assessment and Selection , 2013, Int. J. Artif. Intell. Tools.

[27]  Jilong Li,et al.  Large-scale model quality assessment for improving protein tertiary structure prediction , 2015, Bioinform..

[28]  Leszek Rychlewski,et al.  Evaluation of 3D-Jury on CASP7 models , 2007, BMC Bioinformatics.

[29]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[30]  Anna Tramontano,et al.  Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11 , 2016, Proteins.

[31]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[32]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[33]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[34]  Ruqian Lu,et al.  Sorting protein decoys by machine-learning-to-rank , 2016, Scientific Reports.

[35]  Saraswathi Vishveshwara,et al.  Ranking the quality of protein structure models using sidechain based network properties , 2014, F1000Research.

[36]  Arne Elofsson,et al.  Identification of correct regions in protein models using structural, alignment, and consensus information , 2006, Protein science : a publication of the Protein Society.

[37]  Hongyi Zhou,et al.  An accurate, residue‐level, pair potential of mean force for folding and binding based on the distance‐scaled, ideal‐gas reference state , 2004, Protein science : a publication of the Protein Society.