iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree

A soluble carrier growth hormone binding protein (GHBP) that can selectively and non-covalently interact with growth hormone, thereby acting as a modulator or inhibitor of growth hormone signalling. Accurate identification of the GHBP from a given protein sequence also provides important clues for understanding cell growth and cellular mechanisms. In the postgenomic era, there has been an abundance of protein sequence data garnered, hence it is crucial to develop an automated computational method which enables fast and accurate identification of putative GHBPs within a vast number of candidate proteins. In this study, we describe a novel machine-learning-based predictor called iGHBP for the identification of GHBP. In order to predict GHBP from a given protein sequence, we trained an extremely randomised tree with an optimal feature set that was obtained from a combination of dipeptide composition and amino acid index values by applying a two-step feature selection protocol. During cross-validation analysis, iGHBP achieved an accuracy of 84.9%, which was ~7% higher than the control extremely randomised tree predictor trained with all features, thus demonstrating the effectiveness of our feature selection protocol. Furthermore, when objectively evaluated on an independent data set, our proposed iGHBP method displayed superior performance compared to the existing method. Additionally, a user-friendly web server that implements the proposed iGHBP has been established and is available at http://thegleelab.org/iGHBP.

[1]  Ran Su,et al.  M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning , 2018, Molecular therapy. Nucleic acids.

[2]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Hua Tang,et al.  IonchanPred 2.0: A Tool to Predict Ion Channels and Their Types , 2017, International journal of molecular sciences.

[4]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[5]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[6]  William I. Wood,et al.  Growth hormone receptor and serum binding protein: purification, cloning and expression , 1987, Nature.

[7]  Q. Zou,et al.  SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides , 2017, BMC Genomics.

[8]  Michael Eickenberg,et al.  Machine learning for neuroimaging with scikit-learn , 2014, Front. Neuroinform..

[9]  Ujjwal Maulik,et al.  Fuzzy clustering of physicochemical and biochemical properties of amino Acids , 2011, Amino Acids.

[10]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[11]  P. Kelly,et al.  Identification and modulation of a growth hormone-binding protein in rainbow trout (Oncorhynchus mykiss) plasma during seawater adaptation. , 1998, General and comparative endocrinology.

[12]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[13]  K. Chou,et al.  iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites , 2018, Molecular therapy. Nucleic acids.

[14]  G. Baumann,et al.  Growth hormone binding protein. The soluble growth hormone receptor. , 2002, Minerva endocrinologica.

[15]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[16]  Renzhi Cao,et al.  UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling , 2016, Bioinform..

[17]  Raghvendra Mall,et al.  PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine , 2018, Bioinform..

[18]  Xiaofeng Liu,et al.  Developing a Multi-Dose Computational Model for Drug-Induced Hepatotoxicity Prediction Based on Toxicogenomics Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Balachandran Manavalan,et al.  Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy. , 2018, Journal of proteome research.

[20]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[21]  Jingjing Yang,et al.  EMNets: A Convolutional Autoencoder for Protein Surface Retrieval Based on Cryo-Electron Microscopy Imaging , 2018, BCB.

[22]  Ran Su,et al.  CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning , 2018, Briefings Bioinform..

[23]  Martin Bidlingmaier,et al.  Growth hormone binding protein - physiological and analytical aspects. , 2015, Best practice & research. Clinical endocrinology & metabolism.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Renzhi Cao,et al.  3Drefine: an interactive web server for efficient protein structure refinement , 2016, Nucleic Acids Res..

[26]  Balachandran Manavalan,et al.  DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest , 2017, bioRxiv.

[27]  J. Frystyk,et al.  A simple, rapid immunometric assay for determination of functional and growth hormone‐occupied growth hormone‐binding protein in human serum , 1996, European journal of clinical investigation.

[28]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Sangdun Choi,et al.  Structure-Function Relationship of Cytoplasmic and Nuclear IκB Proteins: An In Silico Analysis , 2010, PloS one.

[30]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[31]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[32]  Yan Lin,et al.  iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators , 2018, Bioinform..

[33]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[34]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[35]  Miao Sun,et al.  QAcon: single model quality assessment using protein structural and contact information with machine learning techniques , 2016, Bioinform..

[36]  Renzhi Cao,et al.  Survey of Machine Learning Techniques in Drug Discovery. , 2019, Current drug metabolism.

[37]  Jie Hou,et al.  DeepQA: improving the estimation of single protein model quality with deep belief networks , 2016, BMC Bioinformatics.

[38]  B. Björnsson,et al.  Plasma growth hormone-binding protein levels in Atlantic salmon Salmo salar during smoltification and seawater transfer. , 2014, Journal of fish biology.

[39]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[40]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[41]  Ran Su,et al.  Exploring sequence‐based features for the improved prediction of DNA N4‐methylcytosine sites in multiple species , 2018, Bioinform..

[42]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[43]  Myeong Ok Kim,et al.  iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction , 2018, Front. Immunol..

[44]  Hua Tang,et al.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition , 2016, BioMed research international.

[45]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[46]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[48]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  Sangdun Choi,et al.  Evolutionary, Structural and Functional Interplay of the IκB Family Members , 2013, PloS one.

[50]  Y Saito,et al.  Activation of protein kinase C alpha enhances human growth hormone-binding protein release. , 1998, Molecular and cellular endocrinology.

[51]  Myeong Ok Kim,et al.  PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions , 2018, Front. Immunol..

[52]  Jing Jiang,et al.  Metalloprotease-mediated GH Receptor Proteolysis and GHBP Shedding , 2002, The Journal of Biological Chemistry.

[53]  Kuo-Chen Chou,et al.  iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC , 2018, International journal of biological sciences.

[54]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[55]  S. Fisker,et al.  Physiology and pathophysiology of growth hormone-binding protein: methodological and clinical aspects. , 2006, Growth hormone & IGF research : official journal of the Growth Hormone Research Society and the International IGF Research Society.

[56]  Gwang Lee,et al.  AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest , 2018, Front. Pharmacol..

[57]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[58]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[59]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[60]  Wei Chen,et al.  Sequence-based predictive modeling to identify cancerlectins , 2017, Oncotarget.

[61]  Sangdun Choi,et al.  In Silico Approach to Inhibition of Signaling Pathways of Toll-Like Receptors 2 and 4 by ST2L , 2011, PloS one.

[62]  Sangdun Choi,et al.  Molecular modeling‐based evaluation of dual function of IκBζ ankyrin repeat domain in toll‐like receptor signaling , 2011, Journal of molecular recognition : JMR.

[63]  J. Sawada,et al.  Activation of protein kinase Cα enhances human growth hormone-binding protein release , 1998, Molecular and Cellular Endocrinology.

[64]  Yair Neuman The Definition of Life and the Life of a Definition , 2012, Journal of biomolecular structure & dynamics.

[65]  Renzhi Cao,et al.  Protein single-model quality assessment by feature-based probability density functions , 2016, Scientific Reports.

[66]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[67]  Hua Tang,et al.  A two-step discriminated method to identify thermophilic proteins , 2017 .

[68]  T. Marchant,et al.  Identification of serum GH-binding proteins in the goldfish (Carassius auratus) and comparison with mammalian GH-binding proteins. , 1999, The Journal of endocrinology.

[69]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[70]  Balachandran Manavalan,et al.  MLACP: machine-learning-based prediction of anticancer peptides , 2017, Oncotarget.

[71]  Jooyoung Lee,et al.  SVMQA: support‐vector‐machine‐based protein single‐model quality assessment , 2017, Bioinform..

[72]  A C Herington,et al.  Identification and characterization of specific binding proteins for growth hormone in normal human sera. , 1986, The Journal of clinical investigation.

[73]  Z. Hochberg,et al.  Clinical review 112: Does serum growth hormone (GH) binding protein reflect human GH receptor function? , 2000, The Journal of clinical endocrinology and metabolism.

[74]  F. Talamantes,et al.  Alternative processing of growth hormone receptor transcripts. , 1998, Endocrine reviews.