Benchmarking consensus model quality assessment for protein fold recognition

BackgroundSelecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network.ResultsThe ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models.ConclusionMQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.

[1]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[2]  Daniel Fischer,et al.  Servers for protein structure prediction. , 2006, Current opinion in structural biology.

[3]  M. Levitt,et al.  A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. , 2003, Journal of molecular biology.

[4]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[5]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[6]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[7]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[8]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[9]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[10]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[11]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[12]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[13]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[14]  D T Jones,et al.  Prediction of novel and analogous folds using fragment assembly and fold recognition , 2005, Proteins.

[15]  Silvio C. E. Tosatto,et al.  The Victor/FRST Function for Model Quality Estimation , 2005, J. Comput. Biol..

[16]  Arne Elofsson,et al.  Automatic consensus‐based fold recognition using Pcons, ProQ, and Pmodeller , 2003, Proteins.

[17]  A. Sali,et al.  A composite score for predicting errors in protein structure models , 2006, Protein science : a publication of the Protein Society.

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  Liam J. McGuffin,et al.  What are the baselines for protein fold recognition? , 2001, Bioinform..

[20]  Liam J. McGuffin,et al.  Improving sequence-based fold recognition by using 3D model quality assessment , 2005, Bioinform..

[21]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[22]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[23]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[24]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[25]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[26]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[27]  Liam J. McGuffin,et al.  High throughput profile-profile based fold recognition for the entire human proteome , 2006, BMC Bioinformatics.