HEMEsPred: Structure-Based Ligand-Specific Heme Binding Residues Prediction by Using Fast-Adaptive Ensemble Learning Scheme

Heme is an essential biomolecule that widely exists in numerous extant organisms. Accurately identifying heme binding residues (HEMEs) is of great importance in disease progression and drug development. In this study, a novel predictor named HEMEsPred was proposed for predicting HEMEs. First, several sequence- and structure-based features, including amino acid composition, motifs, surface preferences, and secondary structure, were collected to construct feature matrices. Second, a novel fast-adaptive ensemble learning scheme was designed to overcome the serious class-imbalance problem as well as to enhance the prediction performance. Third, we further developed ligand-specific models considering that different heme ligands varied significantly in their roles, sizes, and distributions. Statistical test proved the effectiveness of ligand-specific models. Experimental results on benchmark datasets demonstrated good robustness of our proposed method. Furthermore, our method also showed good generalization capability and outperformed many state-of-art predictors on two independent testing datasets. HEMEsPred web server was available at http://www.inforstation.com/HEMEsPred/ for free academic use.

[1]  Hiroki Shirai,et al.  Use of amino acid composition to predict epitope residues of individual antibodies. , 2010, Protein engineering, design & selection : PEDS.

[2]  Jun Zhang,et al.  Ligand preference and orientation in b‐ and c‐type heme‐binding proteins , 2008, Proteins.

[3]  Gérard Simonneaux,et al.  Mechanism of Electron Transfer in Heme Proteins and Models: The NMR Approach , 2005 .

[4]  Tao Zeng,et al.  Prediction of heme binding residues from protein sequences with integrative sequence profiles , 2012, Proteome Science.

[5]  Charles J. Reedy,et al.  Heme protein assemblies. , 2004, Chemical reviews.

[6]  Toru Shimizu,et al.  Elucidation of the Heme Binding Site of Heme-regulated Eukaryotic Initiation Factor 2α Kinase and the Role of the Regulatory Motif in Heme Sensing by Spectroscopic and Catalytic Studies of Mutant Proteins* , 2008, Journal of Biological Chemistry.

[7]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[8]  Xiaowei Zhao,et al.  PECM: prediction of extracellular matrix proteins using the concept of Chou's pseudo amino acid composition. , 2014, Journal of theoretical biology.

[9]  Miroslav Oborník,et al.  Make It, Take It, or Leave It: Heme Metabolism of Parasites , 2013, PLoS pathogens.

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Toru Shimizu,et al.  Structure—Function Relationships of EcDOS, a Heme‐Regulated Phosphodiesterase from Escherichia coli , 2006 .

[13]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[14]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[15]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[16]  Hideaki Sato,et al.  Structural basis for the electron transfer from an open form of NADPH-cytochrome P450 oxidoreductase to heme oxygenase , 2014, Proceedings of the National Academy of Sciences.

[17]  Xiaowei Zhao,et al.  Conformational B-Cell Epitopes Prediction from Sequences Using Cost-Sensitive Ensemble Classifiers and Spatial Clustering , 2014, BioMed research international.

[18]  Li Zhang,et al.  Structural Environment Dictates the Biological Significance of Heme-Responsive Motifs and the Role of Hsp90 in the Activation of the Heme Activator Protein Hap1 , 2003, Molecular and Cellular Biology.

[19]  Ting Li,et al.  Structural analysis of heme proteins: implications for design and prediction , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[20]  Jinfeng Zhang,et al.  Protein Surface Matching by Combining Local and Global Geometric Information , 2012, PloS one.

[21]  Raul H. C. Lopes,et al.  Pengaruh Latihan Small Sided Games 4 Lawan 4 Dengan Maksimal Tiga Sentuhan Terhadap Peningkatan VO2MAX Pada Siswa SSB Tunas Muda Bragang Klampis U-15 , 2022, Jurnal Ilmiah Mandala Education.

[22]  Jianjun Hu,et al.  HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information , 2011, BMC Bioinformatics.

[23]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[24]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[25]  Paul M. Jenkins,et al.  Heme Regulatory Motifs in Heme Oxygenase-2 Form a Thiol/Disulfide Redox Switch That Responds to the Cellular Redox State* , 2009, The Journal of Biological Chemistry.

[26]  Bernard Perbal,et al.  The CCN family of proteins: structure–function relationships , 2008, Trends in biochemical sciences.

[27]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[28]  Zhiqiang Ma,et al.  PSNO: Predicting Cysteine S-Nitrosylation Sites by Incorporating Various Sequence-Derived Features into the General Form of Chou’s PseAAC , 2014, International journal of molecular sciences.

[29]  M. Villani,et al.  A global optimization approach for the synchronous motors design by finite element analysis , 2002 .

[30]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[31]  Lukasz A. Kurgan,et al.  Critical assessment of high-throughput standalone methods for secondary structure prediction , 2011, Briefings Bioinform..

[32]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[33]  Jun Hu,et al.  Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  A. Tappel,et al.  Heme of consumed red meat can act as a catalyst of oxidative damage and could initiate colon, breast and prostate cancers, heart disease and other diseases. , 2007, Medical hypotheses.

[35]  Nimrod D. Rubinstein,et al.  A machine-learning approach for predicting B-cell epitopes. , 2009, Molecular immunology.

[36]  N. Nezami,et al.  High-sensitivity C-reactive protein (hs-CRP) and tumor necrotizing factor-alpha (TNF-alpha) after on- and off- pump coronary artery bypass grafting , 2010, HSR proceedings in intensive care & cardiovascular anesthesia.

[37]  Jon Marles-Wright,et al.  Structure-function relationships in heme-proteins. , 2002, DNA and cell biology.

[38]  Zhiqiang Ma,et al.  Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme , 2014, BioData Mining.

[39]  G. Schneider,et al.  PocketPicker: analysis of ligand binding-sites with shape descriptors , 2007, Chemistry Central Journal.

[40]  Lukasz Kurgan,et al.  ATPsite: sequence-based prediction of ATP-binding residues , 2011, Proteome Science.

[41]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[42]  Jun Hu,et al.  TargetATPsite: A template‐free method for ATP‐binding sites prediction with residue evolution image sparse representation and classifier ensemble , 2013, J. Comput. Chem..

[43]  Vladimir Vacic,et al.  Composition Profiler: a tool for discovery and visualization of amino acid composition differences , 2007, BMC Bioinformatics.

[44]  Andrew J. Bordner,et al.  Predicting small ligand binding sites in proteins using backbone structure , 2008, Bioinform..

[45]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[46]  Adam Yao,et al.  LISE: a server using ligand-interacting and site-enriched protein triangles for prediction of ligand-binding sites , 2013, Nucleic Acids Res..

[47]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[48]  Concettina Guerra,et al.  Predicting protein-ligand and protein-peptide interfaces , 2014 .

[49]  Rong Liu,et al.  Computational Prediction of Heme-Binding Residues by Exploiting Residue Interaction Network , 2011, PloS one.

[50]  G Ulrich Nienhaus,et al.  Probing heme protein-ligand interactions by UV/visible absorption spectroscopy. , 2005, Methods in molecular biology.

[51]  T. Spiro,et al.  Resonance Raman spectroscopy as a probe of heme protein structure and dynamics. , 1985, Advances in protein chemistry.

[52]  Bruce Alberts,et al.  Essential Cell Biology , 1983 .

[53]  Paola Turano,et al.  Deciphering the Structural Role of Histidine 83 for Heme Binding in Hemophore HasA* , 2008, Journal of Biological Chemistry.

[54]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[55]  Kazuhiro Iwai,et al.  Involvement of heme regulatory motif in heme-mediated ubiquitination and degradation of IRP2. , 2005, Molecular cell.

[56]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[57]  Hong Wang,et al.  Activation of AMPK stimulates heme oxygenase-1 gene expression and human endothelial cell survival. , 2011, American journal of physiology. Heart and circulatory physiology.

[58]  Paul Taylor,et al.  Identification of protein binding surfaces using surface triplet propensities , 2010, Bioinform..