FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins

BackgroundThe accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.ResultsThe FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.ConclusionsThe FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.

[1]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[2]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[3]  BMC Bioinformatics , 2005 .

[4]  Michal Brylinski,et al.  Comparison of structure‐based and threading‐based approaches to protein functional annotation , 2010, Proteins.

[5]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[6]  Anna Tramontano,et al.  The prediction of protein function at CASP6 , 2005, Proteins.

[7]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[8]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[9]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[10]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[11]  Michael J E Sternberg,et al.  Prediction of ligand binding sites using homologous structures and conservation at CASP8 , 2009, Proteins.

[12]  Gonzalo López,et al.  Assessment of ligand binding residue predictions in CASP8 , 2009, Proteins.

[13]  Michael J. E. Sternberg,et al.  3DLigandSite: predicting ligand-binding sites using similar structures , 2010, Nucleic Acids Res..

[14]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..

[15]  Kimmen Sjölander,et al.  INTREPID: a web server for prediction of functionally important residues by evolutionary analysis , 2009, Nucleic Acids Res..

[16]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[17]  Alfonso Valencia,et al.  Assessment of predictions submitted for the CASP7 function prediction category. , 2007, Proteins.

[18]  Torsten Schwede,et al.  Assessment of ligand‐binding residue predictions in CASP9 , 2011, Proteins.

[19]  Liam J. McGuffin,et al.  The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction , 2011, Nucleic Acids Res..

[20]  Liam J. McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server , 2009, Proteins.

[21]  Alfonso Valencia,et al.  firestar—prediction of functionally important residues using structural templates and alignment reliability , 2007, Nucleic Acids Res..

[22]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[23]  Dario Ghersi,et al.  SITEHOUND-web: a server for ligand binding site identification in protein structures , 2009, Nucleic Acids Res..

[24]  Liam J. McGuffin,et al.  Intrinsic disorder prediction from the analysis of multiple protein fold recognition models , 2008, Bioinform..

[25]  Ricardo Núñez Miguel Sequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families , 2004, Bioinform..

[26]  Keehyoung Joo,et al.  Protein‐binding site prediction based on three‐dimensional protein modeling , 2009, Proteins.

[27]  Alfonso Valencia,et al.  FireDB—a database of functionally important residues from proteins of known structure , 2006, Nucleic Acids Res..

[28]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[29]  Roland L Dunbrack,et al.  Outcome of a workshop on applications of protein models in biomedical research. , 2009, Structure.

[30]  Liam J. McGuffin,et al.  The binding site distance test score: a robust method for the assessment of predicted protein binding sites , 2010, Bioinform..

[31]  J. S. Sodhi,et al.  Predicting metal-binding site residues in low-resolution structural models. , 2004, Journal of molecular biology.

[32]  C. D. Andersson,et al.  Mapping of ligand‐binding cavities in proteins , 2010, Proteins.

[33]  Janet M. Thornton,et al.  WSsas: a web service for the annotation of functional residues through structural homologues , 2009, Bioinform..

[34]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[35]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[37]  George Karypis,et al.  YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction , 2006, Proteins.