Drug-target interaction prediction with tree-ensemble learning and output space reconstruction

Background Computational prediction of drug-target interactions (DTI) is vital for drug discovery. The experimental identification of interactions between drugs and target proteins is very onerous. Modern technologies have mitigated the problem, leveraging the development of new drugs. However, drug development remains extremely expensive and time consuming. Therefore, in silico DTI predictions based on machine learning can alleviate the burdensome task of drug development. Many machine learning approaches have been proposed over the years for DTI prediction. Nevertheless, prediction accuracy and efficiency are persisting problems that still need to be tackled. Here, we propose a new learning method which addresses DTI prediction as a multi-output prediction task by learning ensembles of multi-output bi-clustering trees (eBICT) on reconstructed networks. In our setting, the nodes of a DTI network (drugs and proteins) are represented by features (background information). The interactions between the nodes of a DTI network are modeled as an interaction matrix and compose the output space in our problem. The proposed approach integrates background information from both drug and target protein spaces into the same global network framework. Results We performed an empirical evaluation, comparing the proposed approach to state of the art DTI prediction methods and demonstrated the effectiveness of the proposed approach in different prediction settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein networks. We show that output space reconstruction can boost the predictive performance of tree-ensemble learning methods, yielding more accurate DTI predictions. Conclusions We proposed a new DTI prediction method where bi-clustering trees are built on reconstructed networks. Building tree-ensemble learning models with output space reconstruction leads to superior prediction results, while preserving the advantages of tree-ensembles, such as scalability, interpretability and inductive setting.

[1]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[2]  Konstantinos Pliakos,et al.  Network inference with ensembles of bi-clustering trees , 2019, BMC Bioinformatics.

[3]  Yun Xie,et al.  Identification of drug-target interaction from interactome network with 'guilt-by-association' principle and topology features , 2016, Bioinform..

[4]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[5]  Chee Keong Kwoh,et al.  Drug-target interaction prediction via class imbalance-aware ensemble learning , 2016, BMC Bioinformatics.

[6]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[7]  Jianfeng Pei,et al.  Systems biology brings new dimensions for structure-based drug design. , 2014, Journal of the American Chemical Society.

[8]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[9]  Yanqing Niu,et al.  Recent Advances in the Machine Learning-Based Drug-Target Interaction Prediction. , 2019, Current drug metabolism.

[10]  Chunyan Miao,et al.  Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction , 2016, PLoS Comput. Biol..

[11]  Shi-Hua Zhang,et al.  DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank , 2016, Bioinform..

[12]  Jian-Yu Shi,et al.  A unified solution for different scenarios of predicting drug-target interactions via triple matrix factorization , 2018, BMC Systems Biology.

[13]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[14]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[15]  Pierre Geurts,et al.  Global multi-output decision trees for interaction prediction , 2018, Machine Learning.

[16]  Joel Lexchin,et al.  The cost of drug development: a systematic review. , 2011, Health policy.

[17]  Anthony K. H. Tung,et al.  Multi-Domain Manifold Learning for Drug-Target Interaction Prediction , 2016, SDM.

[18]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[19]  Chee-Keong Kwoh,et al.  Ensemble Prediction of Synergistic Drug Combinations Incorporating Biological, Chemical, Pharmacological, and Network Knowledge , 2019, IEEE Journal of Biomedical and Health Informatics.

[20]  Hyeon-Eui Kim,et al.  Deep mining heterogeneous networks of biomedical linked data to predict novel drug‐target associations , 2017, Bioinform..

[21]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[22]  Jesse Davis,et al.  Learning from positive and unlabeled data: a survey , 2018, Machine Learning.

[23]  Feng Liu,et al.  Predicting drug side effects by multi-label learning and ensemble learning , 2015, BMC Bioinformatics.

[24]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[25]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[26]  Saso Dzeroski,et al.  Tree ensembles for predicting structured outputs , 2013, Pattern Recognit..

[27]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[28]  Jean-Philippe Vert,et al.  Reconstruction of Biological Networks by Supervised Machine Learning Approaches , 2008 .

[29]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[30]  Jian-Yu Shi,et al.  Predicting drug-target interaction for new drugs using enhanced similarity measures and super-target clustering. , 2015, Methods.

[31]  MeiJian-Ping,et al.  Drug–target interaction prediction by learning from local information and neighbors , 2013 .

[32]  Zhiyong Lu,et al.  A survey of current trends in computational drug repositioning , 2016, Briefings Bioinform..

[33]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[34]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[35]  Jean-Philippe Vert,et al.  Supervised reconstruction of biological networks with local models , 2007, ISMB/ECCB.

[36]  Bernard De Baets,et al.  Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method , 2016, ArXiv.

[37]  Jian-Yu Shi,et al.  Inferring Interactions between Novel Drugs and Novel Targets via Instance-Neighborhood-Based Models. , 2018, Current protein & peptide science.

[38]  Yongdong Zhang,et al.  Drug-target interaction prediction: databases, web servers and computational models , 2016, Briefings Bioinform..

[39]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[40]  Dingfang Li,et al.  Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information , 2017, Molecules.

[41]  Jie Li,et al.  SDTNBI: an integrated network and chemoinformatics tool for systematic prediction of drug–target interactions and drug repositioning , 2016, Briefings Bioinform..

[42]  Jennifer Venhorst,et al.  Target-drug interactions: first principles and their application to drug discovery. , 2012, Drug discovery today.

[43]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[44]  Grigorios Tsoumakas,et al.  Predicting Drug-Target Interactions With Multi-Label Classification and Label Partitioning , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Chee Keong Kwoh,et al.  Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  William Stafford Noble,et al.  A new pairwise kernel for biological network inference with support vector machines , 2007, BMC Bioinformatics.

[47]  Peter Antal,et al.  VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization , 2017, BMC Bioinformatics.

[48]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Pierre Geurts,et al.  Classifying pairs with trees for supervised biological network inference† †Electronic supplementary information (ESI) available: Implementation and computational issues, supplementary performance curves, and illustration of interpretability of trees. See DOI: 10.1039/c5mb00174a Click here for additi , 2014, Molecular bioSystems.

[50]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[51]  J. Lehár,et al.  Multi-target therapeutics: when the whole is greater than the sum of the parts. , 2007, Drug discovery today.

[52]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[53]  Vladimir B. Bajic,et al.  DDR: efficient computational method to predict drug–target interactions using graph mining and machine learning approaches , 2017, Bioinform..

[54]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[55]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  Chee-Keong Kwoh,et al.  Computational prediction of drug-target interactions using chemogenomic approaches: an empirical survey , 2019, Briefings Bioinform..

[58]  Hui Liu,et al.  Improving compound–protein interaction prediction by building up highly credible negative samples , 2015, Bioinform..

[59]  Kwang-Hyun Cho,et al.  REACHing for chemical safety. , 2003, BMC Bioinformatics.

[60]  Sarah L. Kinnings,et al.  Novel computational approaches to polypharmacology as a means to define responses to individual drugs. , 2012, Annual review of pharmacology and toxicology.

[61]  Keqin Li,et al.  Predicting Drug–Target Interactions With Multi-Information Fusion , 2017, IEEE Journal of Biomedical and Health Informatics.

[62]  Hendrik Blockeel,et al.  Seeing the Forest Through the Trees: Learning a Comprehensible Model from an Ensemble , 2007, ECML.

[63]  David Craft,et al.  The value of prior knowledge in machine learning of complex network systems , 2016, bioRxiv.

[64]  Eyke Hüllermeier,et al.  Multi-target prediction: a unifying view on problems and methods , 2018, Data Mining and Knowledge Discovery.