Drug-target interaction prediction via class imbalance-aware ensemble learning

BackgroundMultiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types.ResultsWe propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully.ConclusionsOur proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Chee Keong Kwoh,et al.  Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Anne Mai Wassermann,et al.  Ligand Prediction for Orphan Targets Using Support Vector Machines and Various Target-Ligand Kernels Is Dominated by Nearest Neighbor Effects , 2009, J. Chem. Inf. Model..

[4]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[5]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[6]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[7]  Loris Nanni,et al.  A set of descriptors for identifying the protein-drug interaction in cellular networking. , 2014, Journal of theoretical biology.

[8]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[9]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[10]  Ali Masoudi-Nejad,et al.  Drug–target interaction prediction via chemogenomic space: learning-based methods , 2014, Expert opinion on drug metabolism & toxicology.

[11]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[12]  David S. Wishart,et al.  T3DB: a comprehensively annotated database of common toxins and their targets , 2009, Nucleic Acids Res..

[13]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  J. Ménissier-de murcia,et al.  XRCC1 is phosphorylated by DNA-dependent protein kinase in response to DNA damage , 2006, Nucleic acids research.

[16]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[17]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[18]  Damian Szklarczyk,et al.  STITCH 4: integration of protein–chemical interactions with user data , 2013, Nucleic Acids Res..

[19]  K. Chou,et al.  iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking , 2013, PloS one.

[20]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[21]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[22]  Zhiyong Lu,et al.  A survey of current trends in computational drug repositioning , 2016, Briefings Bioinform..

[23]  David Brown,et al.  Pharmacodynamic Modeling of Anti-Cancer Activity of Tetraiodothyroacetic Acid in a Perfused Cell Culture System , 2011, PLoS Comput. Biol..

[24]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[25]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[26]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[27]  Yoshihiro Yamanishi,et al.  Extracting Sets of Chemical Substructures and Protein Domains Governing Drug-Target Interactions , 2011, J. Chem. Inf. Model..

[28]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[29]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[30]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Weak Inhibition of Multiple Kinases May Contribute to the Anti-Cancer Effect of Nelfinavir , 2011, PLoS Comput. Biol..

[31]  Xiaomin Luo,et al.  TarFisDock: a web server for identifying drug targets with docking approach , 2006, Nucleic Acids Res..

[32]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[33]  MeiJian-Ping,et al.  Drug–target interaction prediction by learning from local information and neighbors , 2013 .

[34]  D. Perrett,et al.  Looking Like a Leader–Facial Shape Predicts Perceived Height and Leadership Ability , 2013, PloS one.

[35]  Hua Yu,et al.  A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data , 2012, PloS one.

[36]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[37]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[38]  M. Ghiassi,et al.  Classification of Camellia (Theaceae) Species Using Leaf Architecture Variations and Pattern Recognition Techniques , 2012, PloS one.

[39]  Stephen T. C. Wong,et al.  Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. , 2014, Drug discovery today.

[40]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[41]  Dong-Sheng Cao,et al.  Large-scale prediction of drug-target interactions using protein sequences and drug topological structures. , 2012, Analytica chimica acta.

[42]  L. Jiang,et al.  PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[43]  Natalia Novac,et al.  Challenges and opportunities of drug repositioning. , 2013, Trends in pharmacological sciences.

[44]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[45]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..