HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information

BackgroundAccurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues.ResultsHere we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM). The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone.ConclusionsHemeBIND is the first specialized algorithm used to predict binding residues in protein structures for heme ligands. Extensive experiments indicated that both the structure-based and sequence-based methods have effectively identified heme binding residues while the complementary relationship between them can result in a significant improvement in prediction performance. The value of our method is highlighted through the development of HemeBIND web server that is freely accessible at http://mleg.cse.sc.edu/hemeBIND/.

[1]  Gajendra P. S. Raghava,et al.  Identification of ATP binding residues of a protein from its primary sequence , 2009, BMC Bioinformatics.

[2]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[3]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[4]  P. Escribá,et al.  Regulation of heme oxygenase and metallothionein gene expression by the heme analogs, cobalt-, and tin-protoporphyrin. , 1993, The Journal of biological chemistry.

[5]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[6]  Noriyuki Igarashi,et al.  The 2.8 Å structure of hydroxylamine oxidoreductase from a nitrifying chemoautotrophic bacterium, Nitrosomonas europaea , 1997, Nature Structural Biology.

[7]  Michal Brylinski,et al.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling , 2009, PLoS Comput. Biol..

[8]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[9]  Jon Marles-Wright,et al.  Diversity and conservation of interactions for binding heme in b-type heme proteins. , 2007, Natural product reports.

[10]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[11]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[12]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[13]  Charles J. Reedy,et al.  Heme protein assemblies. , 2004, Chemical reviews.

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  G. Vriend,et al.  Molecular docking using surface complementarity , 1996, Proteins.

[17]  Benjamin A. Shoemaker,et al.  Knowledge-based annotation of small molecule binding sites in proteins , 2010, BMC Bioinformatics.

[18]  Gajendra P. S. Raghava,et al.  Identification of NAD interacting residues in proteins , 2010, BMC Bioinformatics.

[19]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[20]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[21]  Oliviero Carugo,et al.  CX, an algorithm that identifies protruding atoms in proteins , 2002, Bioinform..

[22]  J. Winkler,et al.  Electron Transfer In Proteins , 1997, QELS '97., Summaries of Papers Presented at the Quantum Electronics and Laser Science Conference.

[23]  Kenneth A Johnson,et al.  The second enzyme in pyrrolnitrin biosynthetic pathway is related to the heme-dependent dioxygenase superfamily. , 2007, Biochemistry.

[24]  Gajendra P. S. Raghava,et al.  Open Access Research Article Prediction of Gtp Interacting Residues, Dipeptides and Tripeptides in a Protein from Its Evolutionary Information , 2022 .

[25]  Gajendra P. S. Raghava,et al.  Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information , 2010, BMC Bioinformatics.

[26]  Oliviero Carugo,et al.  DPX: for the analysis of the protein core , 2003, Bioinform..

[27]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[28]  O. Schueler‐Furman,et al.  Conserved residue clustering and protein structure prediction , 2003, Proteins.

[29]  R. Wade,et al.  Computational approaches to identifying and characterizing protein binding sites for ligand design , 2009, Journal of molecular recognition : JMR.

[30]  N B Terwilliger,et al.  Functional adaptations of oxygen-transport proteins. , 1998, The Journal of experimental biology.

[31]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[32]  Li Zhang,et al.  Heme: a versatile signaling molecule controlling the activities of diverse regulators ranging from transcription factors to MAP kinases , 2006, Cell Research.

[33]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[34]  G. Schneider,et al.  PocketPicker: analysis of ligand binding-sites with shape descriptors , 2007, Chemistry Central Journal.

[35]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[36]  Jun Zhang,et al.  Ligand preference and orientation in b‐ and c‐type heme‐binding proteins , 2008, Proteins.

[37]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[38]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[39]  F. Guengerich,et al.  Chemical mechanisms of catalysis by cytochromes P-450: a unified view , 1984 .

[40]  Janet M Thornton,et al.  Heme proteins—Diversity in structural characteristics, function, and folding , 2010, Proteins.

[41]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[42]  Ting Guo,et al.  A novel statistical ligand-binding site predictor: application to ATP-binding sites. , 2005, Protein engineering, design & selection : PEDS.

[43]  Jean-Christophe Nebel,et al.  Automatic generation of 3D motifs for classification of protein binding sites , 2007, BMC Bioinformatics.

[44]  Andrew J. Bordner,et al.  Predicting small ligand binding sites in proteins using backbone structure , 2008, Bioinform..

[45]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[46]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[47]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[48]  Jon Marles-Wright,et al.  Structure-function relationships in heme-proteins. , 2002, DNA and cell biology.

[49]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[50]  J. S. Sodhi,et al.  Predicting metal-binding site residues in low-resolution structural models. , 2004, Journal of molecular biology.

[51]  Kei Yura,et al.  Het-PDB Navi.: a database for protein-small molecule interactions. , 2004, Journal of biochemistry.