Comparative analysis of methods for evaluation of protein models against native structures

Motivation Measuring discrepancies between protein models and native structures is at the heart of development of protein structure prediction methods and comparison of their performance. A number of different evaluation methods have been developed; however, their comprehensive and unbiased comparison has not been performed. Results We carried out a comparative analysis of several popular model assessment methods (RMSD, TM‐score, GDT, QCS, CAD‐score, LDDT, SphereGrinder and RPF) to reveal their relative strengths and weaknesses. The analysis, performed on a large and diverse model set derived in the course of three latest community‐wide CASP experiments (CASP10‐12), had two major directions. First, we looked at general differences between the scores by analyzing distribution, correspondence and correlation of their values as well as differences in selecting best models. Second, we examined the score differences taking into account various structural properties of models (stereochemistry, hydrogen bonds, packing of domains and chain fragments, missing residues, protein length and secondary structure). Our results provide a solid basis for an informed selection of the most appropriate score or combination of scores depending on the task at hand. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Ruben Abagyan,et al.  Methods of protein structure comparison. , 2012, Methods in molecular biology.

[2]  Jacek Blazewicz,et al.  SphereGrinder - reference structure-based tool for quality assessment of protein structural models , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[4]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[5]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.

[6]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[7]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Kliment Olechnovic,et al.  The CAD-score web server: contact area-based comparison of structures and interfaces of proteins, nucleic acids and their complexes , 2014, Nucleic Acids Res..

[9]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[10]  Alessandro Barbato,et al.  Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 , 2018, Proteins.

[11]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[12]  Jimin Pei,et al.  An automatic method for CASP9 free modeling structure prediction assessment , 2011, Bioinform..

[13]  Krzysztof Fidelis,et al.  Processing and evaluation of predictions in CASP4 , 2001, Proteins.

[14]  Kliment Olechnovič,et al.  CAD‐score: A new contact area difference‐based function for evaluation of protein structural models , 2013, Proteins.

[15]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[16]  K. Mardia Some properties of clasical multi-dimesional scaling , 1978 .

[17]  James M Aramini,et al.  Assessment of template‐based protein structure predictions in CASP10 , 2014, Proteins.

[18]  Krzysztof Fidelis,et al.  CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL , 2014, Proteins.

[19]  Antonio Rosato,et al.  RPF: a quality assessment tool for protein NMR structures , 2012, Nucleic Acids Res..

[20]  Marco Biasini,et al.  lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests , 2013, Bioinform..

[21]  Robert Powers,et al.  Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. , 2005, Journal of the American Chemical Society.

[22]  David Menéndez Hurtado,et al.  Improved protein model quality assessments by changing the target function , 2018, Proteins.

[23]  Martin Krzywinski,et al.  Points of Significance: Association, correlation and causation , 2015, Nature Methods.

[24]  David Baker,et al.  Structure similarity measure with penalty for close non-equivalent residues , 2009, Bioinform..

[25]  N. Grishin,et al.  CASP9 target classification , 2011, Proteins.

[26]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[27]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .