CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction.

Drug target interaction prediction is a very labor-intensive and expensive experimental process which has motivated researchers to focus on in silico prediction to provide information on potential interaction. In recent years, researchers have proposed several computational approaches for predicting new drug target interactions. In this paper, we present CFSBoost, a simple and computationally cheap ensemble boosting classification model for identification and prediction of drug-target interactions using evolutionary and structural features. CFSBoost uses a simple yet novel feature group selection procedure which allows the model to be computationally very cheap while being able to achieve state of the art performance. The ensemble model uses extra tree as weak learners inside a boosting scheme while holding on to the best model per iteration. We tested our method of four benchmark datasets, which are also referred as gold standard datasets. Our method was able to achieve better score in terms of area under receiver operating characteristic (auROC) curve on 2 out of the 4 datasets. It was also able to achieve higher area under precision recall (auPR) curve on 3 out of the 4 datasets. It has been argued by researchers that auPR metric is more suitable than auROC for comparison of performance on imbalanced datasets such our benchmark datasets. Our reported result shows that, despite of its simplicity in design, CFSBoost's performance is very satisfactory comparing to other literatures. We also provide 5 new possible interactions for each dataset based on CFSBoost's prediction score.

[1]  Keith C. C. Chan,et al.  Large-scale prediction of drug-target interactions from deep representations , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[2]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[3]  Simone Daminelli,et al.  Common neighbours and the local-community-paradigm for topological link prediction in bipartite networks , 2015, ArXiv.

[4]  Xing Chen,et al.  A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences. , 2018, Current protein & peptide science.

[5]  Sajid Ahmed,et al.  MEBoost: Mixing estimators with boosting for imbalanced data classification , 2017, 2017 11th International Conference on Software, Knowledge, Information Management and Applications (SKIMA).

[6]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[7]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[8]  Conrad Plake,et al.  Computational polypharmacology with text mining and ontologies. , 2011, Current pharmaceutical biotechnology.

[9]  Bin Chen,et al.  PubChem as a Source of Polypharmacology , 2009, J. Chem. Inf. Model..

[10]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[11]  Dong-Sheng Cao,et al.  Large-scale prediction of drug-target interactions using protein sequences and drug topological structures. , 2012, Analytica chimica acta.

[12]  Abdollah Dehzangi,et al.  iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features. , 2017, Journal of theoretical biology.

[13]  Chee Keong Kwoh,et al.  Drug-target interaction prediction via class imbalance-aware ensemble learning , 2016, BMC Bioinformatics.

[14]  H FriedmanJerome On Bias, Variance, 0/1Loss, and the Curse-of-Dimensionality , 1997 .

[15]  Panos Kalnis,et al.  DASPfind: new efficient method to predict drug–target interactions , 2016, Journal of Cheminformatics.

[16]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[17]  Dik-Lung Ma,et al.  Drug repositioning by structure-based virtual screening. , 2013, Chemical Society reviews.

[18]  Kurt P. Spindler,et al.  Magnetic resonance imaging of the shoulder. Sensitivity, specificity, and predictive value , 1991 .

[19]  Yanli Wang,et al.  Predicting drug-target interactions by dual-network integrated logistic matrix factorization , 2017, Scientific Reports.

[20]  A. Hopkins,et al.  The role of ligand efficiency metrics in drug discovery , 2014, Nature Reviews Drug Discovery.

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[22]  S. Haggarty,et al.  Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. , 2003, Chemistry & biology.

[23]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[24]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[25]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[26]  Rafael Artuch,et al.  Pyridoxal 5'-phosphate values in cerebrospinal fluid: reference values and diagnosis of PNPO deficiency in paediatric patients. , 2008, Molecular genetics and metabolism.

[27]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[28]  Stephen H. Bryant,et al.  Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique. , 2016, Analytica chimica acta.

[29]  Alan Wee-Chung Liew,et al.  Structure‐based prediction of protein‐ peptide binding regions using Random Forest , 2018, Bioinform..

[30]  Hailin Chen,et al.  A Semi-Supervised Method for Drug-Target Interaction Prediction with Consistency in Networks , 2013, PloS one.

[31]  Hojung Nam,et al.  SELF-BLM: Prediction of drug-target interactions via self-training SVM , 2017, PloS one.

[32]  Hiroshi Mamitsuka,et al.  A probabilistic model for mining implicit 'chemical compound-gene' relations from literature , 2005, ECCB/JBI.

[33]  Hamidreza Pazoki Toroudi,et al.  Neuroprotective Effects of Diazoxide and Its Antagonism by Glibenclamide in Pyramidal Neurons of Rat Hippocampus Subjected to Ischemia-Reperfusion-Induced Injury , 2009, The International journal of neuroscience.

[34]  Alan Wee-Chung Liew,et al.  Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. , 2016, Journal of chemical information and modeling.

[35]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[36]  K. Tujioka,et al.  Effect of adding dietary L-lysine, L-threonine and L-methionine to a low gluten diet on urea synthesis in rats , 2005, Amino Acids.

[37]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[38]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[39]  Kuldip K. Paliwal,et al.  Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins , 2016, Bioinform..

[40]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[41]  Yongdong Zhang,et al.  Drug-target interaction prediction: databases, web servers and computational models , 2016, Briefings Bioinform..

[42]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[43]  Salvatore Alaimo,et al.  Drug–target interaction prediction through domain-tuned network-based inference , 2013, Bioinform..

[44]  Albert C. Pan,et al.  Molecular determinants of drug-receptor binding kinetics. , 2013, Drug discovery today.

[45]  Abdollah Dehzangi,et al.  iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting , 2017, Scientific Reports.

[46]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[47]  Stuart L. Schreiber,et al.  Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays , 2002, Nature.

[48]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[49]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[50]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[51]  Shi-Hua Zhang,et al.  DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank , 2016, Bioinform..

[52]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[53]  Sahand Khakabimamaghani,et al.  Drug-target interaction prediction from PSSM based evolutionary information. , 2016, Journal of pharmacological and toxicological methods.

[54]  Michael Schroeder,et al.  Pioneering topological methods for network-based drug–target prediction by exploiting a brain-network self-organization theory , 2017, Briefings Bioinform..

[55]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[56]  Min Wu,et al.  Drug-target interaction prediction using ensemble learning and dimensionality reduction. , 2017, Methods.

[57]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[58]  Jing Li,et al.  Drug Target Predictions Based on Heterogeneous Graph Inference , 2012, Pacific Symposium on Biocomputing.

[59]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[60]  Lei Chen,et al.  Prediction of drug target groups based on chemical-chemical similarities and chemical-chemical/protein connections. , 2014, Biochimica et biophysica acta.

[61]  T. Tsunoda,et al.  SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. , 2017, Analytical biochemistry.