BE-DTI': Ensemble framework for drug target interaction prediction using dimensionality reduction and active learning

BACKGROUND AND OBJECTIVE Drug-target interaction prediction plays an intrinsic role in the drug discovery process. Prediction of novel drugs and targets helps in identifying optimal drug therapies for various stringent diseases. Computational prediction of drug-target interactions can help to identify potential drug-target pairs and speed-up the process of drug repositioning. In our present, work we have focused on machine learning algorithms for predicting drug-target interactions from the pool of existing drug-target data. The key idea is to train the classifier using existing DTI so as to predict new or unknown DTI. However, there are various challenges such as class imbalance and high dimensional nature of data that need to be addressed before developing optimal drug-target interaction model. METHODS In this paper, we propose a bagging based ensemble framework named BE-DTI' for drug-target interaction prediction using dimensionality reduction and active learning to deal with class-imbalanced data. Active learning helps to improve under-sampling bagging based ensembles. Dimensionality reduction is used to deal with high dimensional data. RESULTS Results show that the proposed technique outperforms the other five competing methods in 10-fold cross-validation experiments in terms of AUC=0.927, Sensitivity=0.886, Specificity=0.864, and G-mean=0.874. CONCLUSION Missing interactions and new interactions are predicted using the proposed framework. Some of the known interactions are removed from the original dataset and their interactions are recalculated to check the accuracy of the proposed framework. Moreover, validation of the proposed approach is performed using the external dataset. All these results show that structurally similar drugs tend to interact with similar targets.

[1]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[2]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[3]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[4]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[5]  Abhigyan Nath,et al.  Prediction of Human Drug Targets and Their Interactions Using Machine Learning Methods: Current and Future Perspectives. , 2018, Methods in molecular biology.

[6]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[7]  Rinkle Rani,et al.  KSRMF: Kernelized similarity based regularized matrix factorization framework for predicting anti-cancer drug responses , 2018, J. Intell. Fuzzy Syst..

[8]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[9]  Hyunju Lee,et al.  Predicting Drug-Target Interactions Using Drug-Drug Interactions , 2013, PloS one.

[10]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[11]  L. Jerome,et al.  The safety and efficacy of ±3,4-methylenedioxymethamphetamine-assisted psychotherapy in subjects with chronic, treatment-resistant posttraumatic stress disorder: the first randomized controlled pilot study , 2011, Journal of psychopharmacology.

[12]  Howard L McLeod,et al.  Pharmacogenomics--drug disposition, drug targets, and side effects. , 2003, The New England journal of medicine.

[13]  Kuo-Chen Chou,et al.  Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design. , 2008, Protein and peptide letters.

[14]  Maria Eugenia Ramirez-Loaiza,et al.  Active learning: an empirical study of common baselines , 2017, Data Mining and Knowledge Discovery.

[15]  Vijay Kumar,et al.  Emperor penguin optimizer: A bio-inspired algorithm for engineering problems , 2018, Knowl. Based Syst..

[16]  Yoshihiro Yamanishi,et al.  Relating drug–protein interaction network with drug side effects , 2012, Bioinform..

[17]  Krisztian Buza,et al.  Drug-target interaction prediction with Bipartite Local Models and hubness-aware regression , 2017, Neurocomputing.

[18]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Ladislav Peska,et al.  Drug-target interaction prediction: A Bayesian ranking approach , 2017, Comput. Methods Programs Biomed..

[20]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[21]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[22]  Hiroshi Mamitsuka,et al.  A probabilistic model for mining implicit 'chemical compound-gene' relations from literature , 2005, ECCB/JBI.

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  Hua Yu,et al.  A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data , 2012, PloS one.

[25]  Bo Du,et al.  Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding , 2015, Pattern Recognit..

[26]  Yoshihiro Yamanishi,et al.  Extracting Sets of Chemical Substructures and Protein Domains Governing Drug-Target Interactions , 2011, J. Chem. Inf. Model..

[27]  Yong Wang,et al.  Computationally Probing Drug-Protein Interactions Via Support Vector Machine , 2010 .

[28]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[29]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[30]  Rinkle Rani,et al.  An Optimized Framework for Cancer Classification Using Deep Learning and Genetic Algorithm , 2017 .

[31]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[32]  Rinkle Rani,et al.  Classification of Cancerous Profiles Using Machine Learning , 2017, 2017 International Conference on Machine Learning and Data Science (MLDS).

[33]  Norman R. Farnsworth,et al.  Cancer Chemopreventive Activity of Resveratrol, a Natural Product Derived from Grapes , 1997, Science.

[34]  Vijay Kumar,et al.  Multi-objective spotted hyena optimizer: A Multi-objective optimization algorithm for engineering problems , 2018, Knowl. Based Syst..

[35]  Louiqa Raschid,et al.  Ieee/acm Transactions on Computational Biology and Bioinformatics 1 Network-based Drug-target Interaction Prediction with Probabilistic Soft Logic , 2022 .

[36]  Yves Moreau,et al.  Linking drug target and pathway activation for effective therapy using multi-task learning , 2018, Scientific Reports.

[37]  Bo Du,et al.  Robust and Discriminative Labeling for Multi-Label Active Learning Based on Maximum Correntropy Criterion , 2017, IEEE Transactions on Image Processing.

[38]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[39]  Susumu Goto,et al.  LIGAND: chemical database for enzyme reactions , 1998, Bioinform..

[40]  Yasuo Tabei,et al.  Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers , 2012, Bioinform..

[41]  Chee Keong Kwoh,et al.  Drug-target interaction prediction via class imbalance-aware ensemble learning , 2016, BMC Bioinformatics.

[42]  J. Cuzick,et al.  A Wilcoxon-type test for trend. , 1985, Statistics in medicine.

[43]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[44]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[45]  Jerzy Stefanowski,et al.  Neighbourhood sampling in bagging for imbalanced data , 2015, Neurocomputing.

[46]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[47]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[48]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[49]  Gaurav Dhiman,et al.  Spotted hyena optimizer: A novel bio-inspired based metaheuristic technique for engineering applications , 2017, Adv. Eng. Softw..

[50]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Weak Inhibition of Multiple Kinases May Contribute to the Anti-Cancer Effect of Nelfinavir , 2011, PLoS Comput. Biol..

[51]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[52]  Steven J. M. Jones,et al.  A Computational Approach to Finding Novel Targets for Existing Drugs , 2011, PLoS Comput. Biol..

[53]  Rinkle Rani,et al.  An integrated framework for identification of effective and synergistic anti-cancer drug combinations , 2018, J. Bioinform. Comput. Biol..

[54]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[55]  Chunhua Zhang,et al.  Kernel-based data fusion improves the drug-protein interaction prediction , 2011, Comput. Biol. Chem..

[56]  George C. Runger,et al.  Active Batch Learning with Stochastic Query-by-Forest (SQBF) , 2011 .

[57]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[58]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.