Model quality assessment using distance constraints from alignments

Given a set of alternative models for a specific protein sequence, the model quality assessment (MQA) problem asks for an assignment of scores to each model in the set. A good MQA program assigns these scores such that they correlate well with real quality of the models, ideally scoring best that model which is closest to the true structure. In this article, we present a new approach for addressing the MQA problem. It is based on distance constraints extracted from alignments to templates of known structure, and is implemented in the Undertaker program for protein structure prediction. One novel feature is that we extract noncontact constraints as well as contact constraints. We describe how the distance constraint extraction is done and we show how they can be used to address the MQA problem. We have compared our method on CASP7 targets and the results show that our method is at least comparable with the best MQA methods that were assessed at CASP7. We also propose a new evaluation measure, Kendall's τ, that is more interpretable than conventional measures used for evaluating MQA methods (Pearson's r and Spearman's ρ). We show clear examples where Kendall's τ agrees much more with our intuition of a correct MQA, and we therefore propose that Kendall's τ be used for future CASP MQA assessments. Proteins 2009. © 2008 Wiley‐Liss, Inc.

[1]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[2]  Liam J. McGuffin,et al.  Benchmarking consensus model quality assessment for protein fold recognition , 2007, BMC Bioinformatics.

[3]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[4]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[5]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  David Baker,et al.  Ranking predicted protein structures with support vector regression , 2007, Proteins.

[8]  Kevin Karplus,et al.  Applying undertaker cost functions to model quality assessment , 2009, Proteins.

[9]  Kevin Karplus,et al.  PREDICT-2ND: a tool for generalized protein local structure prediction , 2008, Bioinform..

[10]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[11]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[12]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[13]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[14]  Anna Tramontano,et al.  Assessment of predictions in the model quality assessment category , 2007, Proteins.

[15]  W. Knight A Computer Method for Calculating Kendall's Tau with Ungrouped Data , 1966 .

[16]  Arne Elofsson,et al.  Prediction of global and local model quality in CASP7 using Pcons and ProQ , 2007, Proteins.

[17]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Richard Hughey,et al.  SAM‐T04: What is new in protein–structure prediction for CASP6 , 2005, Proteins.

[19]  David E. Kim,et al.  Free modeling with Rosetta in CASP6 , 2005, Proteins.