StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides

The release of interleukin (IL)-6 is stimulated by antigenic peptides from pathogens as well as by immune cells for activating aggressive inflammation. IL-6 inducing peptides are derived from pathogens and can be used as diagnostic biomarkers for predicting various stages of disease severity as well as being used as IL-6 inhibitors for the suppression of aggressive multi-signaling immune responses. Thus, the accurate identification of IL-6 inducing peptides is of great importance for investigating their mechanism of action as well as for developing diagnostic and immunotherapeutic applications. This study proposes a novel stacking ensemble model (termed StackIL6) for accurately identifying IL-6 inducing peptides. More specifically, StackIL6 was constructed from twelve different feature descriptors derived from three major groups of features (composition-based features, composition-transition-distribution-based features and physicochemical properties-based features) and five popular machine learning algorithms (extremely randomized trees, logistic regression, multi-layer perceptron, support vector machine and random forest). To enhance the utility of baseline models, they were effectively and systematically integrated through a stacking strategy to build the final meta-based model. Extensive benchmarking experiments demonstrated that StackIL6 could achieve significantly better performance than the existing method (IL6PRED) and outperformed its constituent baseline models on both training and independent test datasets, which thereby support its excellent discrimination and generalization abilities. To facilitate easy access to the StackIL6 model, it was established as a freely available web server accessible at http://camt.pythonanywhere.com/StackIL6. It is anticipated that StackIL6 can help to facilitate rapid screening of promising IL-6 inducing peptides for the development of diagnostic and immunotherapeutic applications in the future.

[1]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[2]  Jiangning Song,et al.  DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy , 2020, Briefings Bioinform..

[3]  T. Kishimoto IL-6: from its discovery to clinical applications. , 2010, International immunology.

[4]  Jiangning Song,et al.  Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms , 2018, Briefings Bioinform..

[5]  Lizhen Cui,et al.  Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework , 2020, Briefings Bioinform..

[6]  Li-Wei Ko,et al.  HCS-Neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening , 2013, BMC Bioinformatics.

[7]  Wei Chen,et al.  iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications , 2020, Bioinform..

[8]  V. Lee,et al.  Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method , 2021, Scientific Reports.

[9]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Trans. Evol. Comput..

[10]  L. Jiang,et al.  PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[11]  Ran Su,et al.  PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning , 2019, Bioinform..

[12]  Ran Su,et al.  CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning , 2018, Briefings Bioinform..

[13]  Chanin Nantasenamat,et al.  iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. , 2020, Genomics.

[14]  M. Tay,et al.  The trinity of COVID-19: immunity, inflammation and intervention , 2020, Nature Reviews Immunology.

[15]  Wei Wang,et al.  Up-regulation of IL-6 and TNF-α induced by SARS-coronavirus spike protein in murine macrophages via NF-κB pathway , 2007, Virus Research.

[16]  Xiaofeng Liu,et al.  Developing a Multi-Dose Computational Model for Drug-Induced Hepatotoxicity Prediction Based on Toxicogenomics Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Shiow-Fen Hwang,et al.  ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features , 2007, Biosyst..

[18]  Dong Wang,et al.  iLoc‐lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC , 2018, Bioinform..

[19]  A. Kimura,et al.  IL‐6: Regulator of Treg/Th17 balance , 2010, European journal of immunology.

[20]  Miho Suzuki,et al.  IL-6/IL-6 receptor system and its role in physiological and pathological conditions. , 2012, Clinical science.

[21]  Yuzhang Wu,et al.  The Novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Directly Decimates Human Spleens and Lymph Nodes , 2020, medRxiv.

[22]  K. Meyer,et al.  SARS-CoV-2 spike protein promotes IL-6 trans-signaling by activation of angiotensin II receptor signaling in epithelial cells , 2020, PLoS pathogens.

[23]  Chanin Nantasenamat,et al.  Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation , 2020, Journal of Computer-Aided Molecular Design.

[24]  Chanin Nantasenamat,et al.  iDPPIV-SCM: A sequence-based predictor for identifying and analyzing dipeptidyl peptidase IV (DPP-IV) inhibitory peptides using a scoring card method. , 2020, Journal of proteome research.

[25]  Faqing Tang,et al.  SARS‐CoV‐2‐mediated immune system activation and potential application in immunotherapy , 2020, Medicinal research reviews.

[26]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[27]  Nalini Schaduangrat,et al.  HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation , 2020, Bioinform..

[28]  Wei Chen,et al.  iRNAD: a computational tool for identifying D modification sites in RNA sequence , 2019, Bioinform..

[29]  Chanin Nantasenamat,et al.  BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides , 2021, Bioinform..

[30]  Wei Chen,et al.  iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning , 2020, Frontiers in Bioengineering and Biotechnology.

[31]  Hiroyuki Kurata,et al.  Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework , 2020, Briefings Bioinform..

[32]  Myeong Ok Kim,et al.  PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions , 2018, Front. Immunol..

[33]  Hao Lin,et al.  XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites , 2019, Molecular Genetics and Genomics.

[34]  Hui Ding,et al.  iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. , 2018, Analytical biochemistry.

[35]  Balachandran Manavalan,et al.  Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening , 2020, Medicinal research reviews.

[36]  Myeong Ok Kim,et al.  iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction , 2018, Front. Immunol..

[37]  Leyi Wei,et al.  Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation , 2019, Molecular therapy. Nucleic acids.

[38]  Neelam Sharma,et al.  Computer-aided prediction and design of IL-6 inducing peptides: IL-6 plays a crucial role in COVID-19 , 2020, Briefings Bioinform..

[39]  Leyi Wei,et al.  mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation , 2018, Bioinform..

[40]  Jiu-Xin Tan,et al.  Evaluation of different computational methods on 5-methylcytosine sites identification , 2020, Briefings Bioinform..

[41]  M. Drutskaya,et al.  IL-6: Relevance for immunopathology of SARS-CoV-2 , 2020, Cytokine & Growth Factor Reviews.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Alessandro Sette,et al.  The Immune Epitope Database (IEDB): 2018 update , 2018, Nucleic Acids Res..

[44]  Hiroyuki Kurata,et al.  i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes , 2020, Computational and structural biotechnology journal.

[45]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[46]  Chanin Nantasenamat,et al.  iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides , 2020, J. Chem. Inf. Model..

[47]  P. Meybohm,et al.  Pro- and Anti-Inflammatory Responses in Severe COVID-19-Induced Acute Respiratory Distress Syndrome—An Observational Pilot Study , 2020, Frontiers in Immunology.

[48]  Hiroyuki Kurata,et al.  i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation , 2020, Plant Molecular Biology.

[49]  Jijun Tang,et al.  Prediction of human protein subcellular localization using deep learning , 2017, J. Parallel Distributed Comput..

[50]  Nalini Schaduangrat,et al.  iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou’s 5-Steps Rule and Informative Physicochemical Properties , 2019, International journal of molecular sciences.

[51]  Geoffrey I. Webb,et al.  DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites , 2019, Bioinform..

[52]  Chanin Nantasenamat,et al.  iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. , 2020, Analytical biochemistry.

[53]  Rajiv Gandhi Govindaraj,et al.  Extremely-randomized-tree-based Prediction of N6-Methyladenosine Sites in Saccharomyces cerevisiae , 2020, Current genomics.

[54]  Stefan Rose-John,et al.  IL-6 pathway in the liver: From physiopathology to therapy. , 2016, Journal of hepatology.

[55]  Leyi Wei,et al.  AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees , 2019, Computational and structural biotechnology journal.

[56]  A. Alam,et al.  Overview of Immune Response During SARS-CoV-2 Infection: Lessons From the Past , 2020, Frontiers in Immunology.

[57]  Xinyi Liu,et al.  Meta-GDBP: a high-level stacked regression model to improve anticancer drug response prediction , 2020, Briefings Bioinform..

[58]  Guoying Zhang,et al.  ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides , 2019, Briefings Bioinform..

[59]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[60]  Wei Chen,et al.  iATP: A Sequence Based Method for Identifying Anti-tubercular Peptides , 2020, Medicinal Chemistry.