FunFOLDQA: A Quality Assessment Tool for Protein-Ligand Binding Site Residue Predictions

The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.

[1]  Dario Ghersi,et al.  SITEHOUND-web: a server for ligand binding site identification in protein structures , 2009, Nucleic Acids Res..

[2]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ross Ihaka,et al.  Lexical Scope and Statistical Computing , 2000 .

[4]  Gonzalo López,et al.  Assessment of ligand binding residue predictions in CASP8 , 2009, Proteins.

[5]  Michael J E Sternberg,et al.  Prediction of ligand binding sites using homologous structures and conservation at CASP8 , 2009, Proteins.

[6]  Michael I. Jordan,et al.  Active site prediction using evolutionary and structural information , 2010, Bioinform..

[7]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.

[8]  Nir Ben-Tal,et al.  Quality assessment of protein model-structures using evolutionary conservation , 2010, Bioinform..

[9]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[10]  C. D. Andersson,et al.  Mapping of ligand‐binding cavities in proteins , 2010, Proteins.

[11]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[12]  Silvio C. E. Tosatto,et al.  Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust , 2009, Proteins.

[13]  Alfonso Valencia,et al.  firestar—advances in the prediction of functionally important residues , 2011, Nucleic Acids Res..

[14]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[15]  Liam J. McGuffin,et al.  FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins , 2011, BMC Bioinformatics.

[16]  Kimmen Sjölander,et al.  INTREPID: a web server for prediction of functionally important residues by evolutionary analysis , 2009, Nucleic Acids Res..

[17]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[18]  Michael J. E. Sternberg,et al.  3DLigandSite: predicting ligand-binding sites using similar structures , 2010, Nucleic Acids Res..

[19]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[20]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[21]  Costas D Maranas,et al.  Recent advances in computational protein design. , 2011, Current opinion in structural biology.

[22]  Liam J. McGuffin,et al.  The binding site distance test score: a robust method for the assessment of predicted protein binding sites , 2010, Bioinform..

[23]  Kai Wang,et al.  Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property Information , 2008, PLoS Comput. Biol..

[24]  Keehyoung Joo,et al.  Protein‐binding site prediction based on three‐dimensional protein modeling , 2009, Proteins.

[25]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[26]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[27]  Janet M. Thornton,et al.  WSsas: a web service for the annotation of functional residues through structural homologues , 2009, Bioinform..

[28]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[29]  Daniel B. Roche,et al.  Automated tertiary structure prediction with accurate local model quality assessment using the intfold‐ts method , 2011, Proteins.

[30]  Jacob de Vlieg,et al.  ss-TEA: Entropy based identification of receptor specific ligand binding residues from a multiple sequence alignment of class A GPCRs , 2011, BMC Bioinformatics.

[31]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[32]  Chuan Yi Tang,et al.  Feature-incorporated alignment based ligand-binding residue prediction for carbohydrate-binding modules , 2010, Bioinform..

[33]  Torsten Schwede,et al.  Assessment of ligand‐binding residue predictions in CASP9 , 2011, Proteins.

[34]  Ricardo Núñez Miguel Sequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families , 2004, Bioinform..

[35]  Ajay N. Jain,et al.  Surface‐based protein binding pocket similarity , 2011, Proteins.

[36]  Alfonso Valencia,et al.  firestar—prediction of functionally important residues using structural templates and alignment reliability , 2007, Nucleic Acids Res..

[37]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[38]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..

[39]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[40]  Anna Tramontano,et al.  The prediction of protein function at CASP6 , 2005, Proteins.

[41]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.