Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments

MOTIVATION The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering- or consensus-based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however, they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ--a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilizing the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS The ModFOLDclustQ method is competitive with leading clustering-based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over five times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering-based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY The ModFOLDclustQ and ModFOLDclust2 methods are available to download from http://www.reading.ac.uk/bioinf/downloads/.

[1]  Zaida Luthey-Schulten,et al.  Evaluating protein structure-prediction schemes using energy landscape theory , 2001, IBM J. Res. Dev..

[2]  Gianluca Pollastri,et al.  Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information , 2009, Proteins.

[3]  Anna Tramontano,et al.  Assessment of predictions in the model quality assessment category , 2007, Proteins.

[4]  Liam J. McGuffin,et al.  High throughput profile-profile based fold recognition for the entire human proteome , 2006, BMC Bioinformatics.

[5]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[6]  Liam J. McGuffin,et al.  The ModFOLD server for the quality assessment of protein structural models , 2008, Bioinform..

[7]  Liam J. McGuffin,et al.  Intrinsic disorder prediction from the analysis of multiple protein fold recognition models , 2008, Bioinform..

[8]  Liam J. McGuffin,et al.  Benchmarking consensus model quality assessment for protein fold recognition , 2007, BMC Bioinformatics.

[9]  Anna Tramontano,et al.  Evaluation of CASP8 model quality predictions , 2009, Proteins.

[10]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[11]  Arne Elofsson,et al.  Identification of correct regions in protein models using structural, alignment, and consensus information , 2006, Protein science : a publication of the Protein Society.

[12]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[13]  Hideki Tachibana,et al.  Comprehensive secondary‐structure analysis of disulfide variants of lysozyme by synchrotron‐radiation vacuum‐ultraviolet circular dichroism , 2009, Proteins.

[14]  J. Skolnick,et al.  Erratum: Scoring function for automated assessment of protein structure template quality (Proteins: Structure, Function and Genetics (2004) 57, (702-710)) , 2007 .

[15]  Kevin Karplus,et al.  Applying Undertaker to quality assessment , 2009, Proteins.

[16]  Leszek Rychlewski,et al.  Evaluation of 3D-Jury on CASP7 models , 2007, BMC Bioinformatics.

[17]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[18]  Liam J. McGuffin,et al.  The Genomic Threading Database , 2004, Bioinform..

[19]  Arne Elofsson,et al.  A study of quality measures for protein threading models , 2001, BMC Bioinformatics.

[20]  Krzysztof Fidelis,et al.  Processing and evaluation of predictions in CASP4 , 2001, Proteins.

[21]  Kevin Karplus,et al.  Applying undertaker cost functions to model quality assessment , 2009, Proteins.

[22]  Jaime Prilusky,et al.  Assessment of CASP8 structure predictions for template free targets , 2009, Proteins.

[23]  W A Koppensteiner,et al.  Automated large scale evaluation of protein structure predictions , 1999, Proteins.

[24]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[25]  J. Mark Bull,et al.  Benchmarking Java against C and Fortran for scientific applications , 2001, JGI '01.

[26]  Silvio C. E. Tosatto,et al.  Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust , 2009, Proteins.

[27]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[28]  Liam J. McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server , 2009, Proteins.

[29]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[30]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[31]  P. Wolynes,et al.  Optimal protein-folding codes from spin-glass theory. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[32]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[33]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.