PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm.

Combinatorial therapy is a promising strategy for combating complex diseases by improving the efficacy and reducing the side effects. To facilitate the identification of drug combinations in pharmacology, we proposed a new computational model, termed PDC-SGB, to predict effective drug combinations by integrating biological, chemical and pharmacological information based on a stochastic gradient boosting algorithm. To begin with, a set of 352 golden positive samples were collected from the public drug combination database. Then, a set of 732 dimensional feature vector involving biological, chemical and pharmaceutical information was constructed for each drug combination to describe its properties. To avoid overfitting, the maximum relevance & minimum redundancy (mRMR) method was performed to extract useful ones by removing redundant subsets. Based on the selected features, the three different type of classification algorithms were employed to build the drug combination prediction models. Our results demonstrated that the model based on the stochastic gradient boosting algorithm yield out the best performance. Furthermore, it is indicated that the feature patterns of therapy had powerful ability to discriminate effective drug combinations from non-effective ones. By analyzing various features, it is shown that the enriched features occurred frequently in golden positive samples can help predict novel drug combinations.

[1]  J. Friedman Stochastic gradient boosting , 2002 .

[2]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[3]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[4]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[5]  Xing-Ming Zhao,et al.  Prediction of Drug Combinations by Integrating Molecular and Pharmacological Data , 2011, PLoS Comput. Biol..

[6]  Ruth Nussinov,et al.  Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review , 2012, Pharmacology & therapeutics.

[7]  Junfeng Xia,et al.  Exploiting a Reduced Set of Weighted Average Features to Improve Prediction of DNA-Binding Residues from 3D Structures , 2011, PloS one.

[8]  Xing Chen,et al.  NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning , 2016, PLoS Comput. Biol..

[9]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[10]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[11]  Kuo-Chen Chou,et al.  iNR-Drug: Predicting the Interaction of Drugs with Nuclear Receptors in Cellular Networking , 2014, International journal of molecular sciences.

[12]  Kuo-Chen Chou,et al.  iROS-gPseKNC: Predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition , 2016, Oncotarget.

[13]  Yu-Dong Cai,et al.  Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties , 2010, PloS one.

[14]  Lei Chen,et al.  Prediction of Effective Drug Combinations by Chemical Interaction, Protein Interaction and Target Enrichment of KEGG Pathways , 2013, BioMed research international.

[15]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[16]  Xing-Ming Zhao,et al.  Exploring drug combinations in genetic interaction network , 2012, BMC Bioinformatics.

[17]  Kuo-Chen Chou,et al.  Deciphering the effects of gene deletion on yeast longevity using network and machine learning approaches. , 2012, Biochimie.

[18]  Loris Nanni,et al.  Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition. , 2014, Journal of theoretical biology.

[19]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[20]  Yi Xiong,et al.  A Hadoop-Based Method to Predict Potential Effective Drug Combination , 2014, BioMed research international.

[21]  Xin Chen,et al.  DCDB 2.0: a major update of the drug combination database , 2014, Database J. Biol. Databases Curation.

[22]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[23]  Xiaohua Ma,et al.  Mechanisms of drug combinations: interaction and network perspectives , 2009, Nature Reviews Drug Discovery.

[24]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[25]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[26]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[27]  K. Chou,et al.  iEzy-Drug: A Web Server for Identifying the Interaction between Enzymes and Drugs in Cellular Networking , 2013, BioMed research international.

[28]  Maqsood Hayat,et al.  iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC. , 2016, Molecular bioSystems.

[29]  Xing-Ming Zhao,et al.  A systems biology approach to identify effective cocktail drugs , 2010, BMC Systems Biology.

[30]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[31]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[32]  K. Chou,et al.  iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC , 2016, Oncotarget.

[33]  Kuo-Chen Chou,et al.  Sequence analysis iEnhancer-2 L : a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016 .

[34]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[35]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[36]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[37]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[38]  P. Toutouzas,et al.  Clinical efficacy and quality of life with indapamide alone or in combination with beta blockers or angiotensin-converting enzyme inhibitors. , 1990, The American journal of cardiology.

[39]  Zhiwei Cao,et al.  Combining genomic and network characteristics for extended capability in predicting synergistic drugs for cancer , 2015, Nature Communications.

[40]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[41]  Kuo-Chen Chou,et al.  iPhos-PseEn: Identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier , 2016, Oncotarget.

[42]  Xing-Ming Zhao,et al.  The drug cocktail network , 2012, BMC Systems Biology.

[43]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[44]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Santiago Vilar,et al.  Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. , 2008, Current topics in medicinal chemistry.

[46]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[47]  George Hripcsak,et al.  Similarity-based modeling in large-scale prediction of drug-drug interactions , 2014, Nature Protocols.

[48]  Yi Xiong,et al.  An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces , 2011, Proteins.

[49]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[50]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.

[51]  K. Chou,et al.  Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. , 2010, Journal of theoretical biology.

[52]  Kuo-Chen Chou,et al.  iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC , 2016, Oncotarget.

[53]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[55]  Damian Szklarczyk,et al.  STITCH 4: integration of protein–chemical interactions with user data , 2013, Nucleic Acids Res..

[56]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[57]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[58]  Xiang Cheng,et al.  iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach , 2015, Journal of biomolecular structure & dynamics.

[59]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[60]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[61]  K. Chou,et al.  iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking , 2013, PloS one.

[62]  Karen A. Ryall,et al.  Systems biology approaches for advancing the discovery of effective drug combinations , 2015, Journal of Cheminformatics.

[63]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[64]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[65]  Kuo-Chen Chou,et al.  pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC , 2016, Bioinform..

[66]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[67]  S. Ichinose,et al.  Amlodipine and carvedilol prevent cytotoxicity in cortical neurons isolated from stroke-prone spontaneously hypertensive rats. , 2004, Hypertension research : official journal of the Japanese Society of Hypertension.

[68]  Tao Zeng,et al.  Prediction of heme binding residues from protein sequences with integrative sequence profiles , 2012, Proteome Science.

[69]  Ren Long,et al.  dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation , 2016, Scientific Reports.