Learning with multiple pairwise kernels for drug bioactivity prediction

Motivation Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel‐based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results We introduce pairwiseMKL, the first method for time‐ and memory‐efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome‐wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation Code is available at https://github.com/aalto‐ics‐kepaco.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  Justin Guinney,et al.  Systematic Assessment of Analytical Methods for Drug Sensitivity Prediction from Cancer Cell Line Data , 2013, Pacific Symposium on Biocomputing.

[3]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[4]  Nobuhiro Nishiyama,et al.  Cover Picture: Utility of the 2‐Nitrobenzenesulfonamide Group as a Chemical Linker for Enhanced Extracellular Stability and Cytosolic Cleavage in siRNA‐Conjugated Polymer Systems (ChemMedChem 1/2017) , 2017 .

[5]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[6]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[7]  Ivan G. Costa,et al.  A multiple kernel learning algorithm for drug-target interaction prediction , 2016, BMC Bioinformatics.

[8]  François Laviolette,et al.  Learning a peptide-protein binding affinity predictor with kernel ridge regression , 2012, BMC Bioinformatics.

[9]  Juho Rousu,et al.  Metabolite identification through multiple kernel learning on fragmentation trees , 2014, Bioinform..

[10]  Juho Rousu,et al.  Machine Learning of Protein Interactions in Fungal Secretory Pathways , 2016, PloS one.

[11]  J. Reymond,et al.  Exploring chemical space for drug discovery using the chemical universe database. , 2012, ACS chemical neuroscience.

[12]  Lea Fleischer,et al.  Regularization of Inverse Problems , 1996 .

[13]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[14]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[15]  Krister Wennerberg,et al.  Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach , 2017, Bioinform..

[16]  Tero Aittokallio,et al.  Drug response prediction by inferring pathway-response associations with kernelized Bayesian matrix factorization , 2016, Bioinform..

[17]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[18]  Dragos Horvath,et al.  Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks , 2016, J. Chem. Inf. Model..

[19]  Manik Sharma,et al.  STATIC AND DYNAMIC BNP PARALLEL SCHEDULING ALGORITHMS FOR DISTRIBUTED DATABASE , 2011, BIOINFORMATICS 2011.

[20]  Juho Rousu,et al.  Fast metabolite identification with Input Output Kernel Regression , 2016, Bioinform..

[21]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[22]  Francisco Azuaje,et al.  Computational models for predicting drug responses in cancer research , 2016, Briefings Bioinform..

[23]  C. Loan The ubiquitous Kronecker product , 2000 .

[24]  Simone Fulle,et al.  Kinome‐Wide Profiling Prediction of Small Molecules , 2018, ChemMedChem.

[25]  Zhaleh Safikhani,et al.  PharmacoDB: an integrative database for mining in vitro anticancer drug screening studies , 2017, bioRxiv.

[26]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[27]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[28]  Mathieu Blanchette,et al.  The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts , 2014, Genome Biology.

[29]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[32]  Michael P. Morrissey,et al.  Pharmacogenomic agreement between two cancer cell line data sets , 2015, Nature.

[33]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[34]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[35]  Zhongming Zhao,et al.  Machine learning-based prediction of drug-drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[36]  Tapio Pahikkala,et al.  Fast Kronecker Product Kernel Methods via Generalized Vec Trick , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[38]  Juho Rousu,et al.  Identification of drug candidates and repurposing opportunities through compound–target interaction networks , 2015, Expert opinion on drug discovery.

[39]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[40]  Juho Rousu,et al.  Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors , 2017, PLoS Comput. Biol..

[41]  Ali Ebrahim,et al.  Multi-omic data integration enables discovery of hidden biological regularities , 2016, Nature Communications.

[42]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[43]  B. Merget,et al.  Profiling Prediction of Kinase Inhibitors: Toward the Virtual Assay. , 2017, Journal of medicinal chemistry.

[44]  David Henderson,et al.  Key factors for successful data integration in biomarker research , 2016, Nature Reviews Drug Discovery.

[45]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[46]  Michael A. Saunders,et al.  CG Versus MINRES: An Empirical Comparison , 2012 .