Toward more realistic drug^target interaction predictions

A number of supervised machine learning models have recently been introduced for the prediction of drug–target interactions based on chemical structure and genomic sequence information. Although these models could offer improved means for many network pharmacology applications, such as repositioning of drugs for new therapeutic uses, the prediction models are often being constructed and evaluated under overly simplified settings that do not reflect the real-life problem in practical applications. Using quantitative drug–target bioactivity assays for kinase inhibitors, as well as a popular benchmarking data set of binary drug–target interactions for enzyme, ion channel, nuclear receptor and G protein-coupled receptor targets, we illustrate here the effects of four factors that may lead to dramatic differences in the prediction results: (i) problem formulation (standard binary classification or more realistic regression formulation), (ii) evaluation data set (drug and target families in the application use case), (iii) evaluation procedure (simple or nested cross-validation) and (iv) experimental setting (whether training and test sets share common drugs and targets, only drugs or targets or neither). Each of these factors should be taken into consideration to avoid reporting overoptimistic drug–target interaction prediction results. We also suggest guidelines on how to make the supervised drug–target interaction prediction studies more realistic in terms of such model formulations and evaluation setups that better address the inherent complexity of the prediction task in the practical applications, as well as novel benchmarking data sets that capture the continuous nature of the drug–target interactions for kinase inhibitors.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Chunhua Zhang,et al.  Kernel-based data fusion improves the drug-protein interaction prediction , 2011, Comput. Biol. Chem..

[3]  Steven J. M. Jones,et al.  A Computational Approach to Finding Novel Targets for Existing Drugs , 2011, PLoS Comput. Biol..

[4]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[5]  Tero Aittokallio,et al.  Network Pharmacology Strategies Toward Multi-Target Anticancer Therapies: From Computational Models to Experimental Design Principles , 2014, Current pharmaceutical design.

[6]  Tapio Pahikkala,et al.  Efficient cross-validation for kernelized least-squares regression with sparse basis expansions , 2012, Machine Learning.

[7]  MeiJian-Ping,et al.  Drug–target interaction prediction by learning from local information and neighbors , 2013 .

[8]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[9]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[10]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[11]  Yadi Zhou,et al.  Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. , 2012, Molecular bioSystems.

[12]  P. Bork,et al.  Large‐scale prediction of drug–target relationships , 2008, FEBS letters.

[13]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[14]  Yoshihiro Yamanishi,et al.  Chemogenomic approaches to infer drug-target interaction networks. , 2013, Methods in molecular biology.

[15]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[16]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[17]  Kris Popendorf,et al.  COPICAT: a software system for predicting interactions between proteins and chemical compounds , 2012, Bioinform..

[18]  P. Hajduk,et al.  Navigating the kinome. , 2011, Nature chemical biology.

[19]  Edda Klipp,et al.  Biochemical network-based drug-target prediction. , 2010, Current opinion in biotechnology.

[20]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[21]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[22]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[23]  Andreas Bender,et al.  From in silico target prediction to multi-target drug design: current databases, methods and applications. , 2011, Journal of proteomics.

[24]  Hao Ding,et al.  Similarity-based machine learning methods for predicting drug-target interactions: a brief review , 2014, Briefings Bioinform..

[25]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[26]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[27]  Sarah L. Kinnings,et al.  Novel computational approaches to polypharmacology as a means to define responses to individual drugs. , 2012, Annual review of pharmacology and toxicology.

[28]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[29]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[30]  Roded Sharan,et al.  Combining Drug and Gene Similarity Measures for Drug-Target Elucidation , 2011, J. Comput. Biol..

[31]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[32]  Tao Xu,et al.  Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis , 2014, J. Chem. Inf. Model..

[33]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[34]  Dmitrij Frishman,et al.  Pitfalls of supervised feature selection , 2009, Bioinform..

[35]  Tapio Salakoski,et al.  Conditional Ranking on Relational Data , 2010, ECML/PKDD.

[36]  Olivier Michielin,et al.  Shaping the interaction landscape of bioactive molecules , 2013, Bioinform..

[37]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[38]  Tapio Salakoski,et al.  An experimental comparison of cross-validation techniques for estimating the area under the ROC curve , 2011, Comput. Stat. Data Anal..

[39]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Tapio Salakoski,et al.  Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations , 2012, Algorithms for Molecular Biology.

[41]  Yong Wang,et al.  Network predicting drug's anatomical therapeutic chemical code , 2013, Bioinform..

[42]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[43]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[44]  Tero Aittokallio,et al.  Predicting drug-target interactions through integrative analysis of chemogenetic assays in yeast. , 2013, Molecular bioSystems.

[45]  Steven J. M. Jones,et al.  Drug repositioning for personalized medicine , 2012, Genome Medicine.

[46]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[47]  Bernard De Baets,et al.  Efficient regularized least-squares algorithms for conditional ranking on relational data , 2012, Machine Learning.

[48]  Hua Yu,et al.  A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data , 2012, PloS one.