Nonlinear data fusion over Entity–Relation graphs for Drug–Target Interaction prediction

Abstract Motivation The prediction of reliable Drug–Target Interactions (DTIs) is a key task in computer-aided drug design and repurposing. Here, we present a new approach based on data fusion for DTI prediction built on top of the NXTfusion library, which generalizes the Matrix Factorization paradigm by extending it to the nonlinear inference over Entity–Relation graphs. Results We benchmarked our approach on five datasets and we compared our models against state-of-the-art methods. Our models outperform most of the existing methods and, simultaneously, retain the flexibility to predict both DTIs as binary classification and regression of the real-valued drug–target affinity, competing with models built explicitly for each task. Moreover, our findings suggest that the validation of DTI methods should be stricter than what has been proposed in some previous studies, focusing more on mimicking real-life DTI settings where predictions for previously unseen drugs, proteins, and drug–protein pairs are needed. These settings are exactly the context in which the benefit of integrating heterogeneous information with our Entity–Relation data fusion approach is the most evident. Availability and implementation All software and data are available at https://github.com/eugeniomazzone/CPI-NXTFusion and https://pypi.org/project/NXTfusion/.

[1]  Yves Moreau,et al.  A novel method for data fusion over entity-relation graphs and its application to protein-protein interaction prediction , 2021, Bioinform..

[2]  Jimeng Sun,et al.  MolTrans: Molecular Interaction Transformer for drug–target interaction prediction , 2020, Bioinform..

[3]  Avner Schlessinger,et al.  Crowdsourced mapping of unexplored target space of kinase inhibitors , 2020, Nature Communications.

[4]  Yixin Chen,et al.  Inductive Matrix Completion Based on Graph Neural Networks , 2019, ICLR.

[5]  Jun Sese,et al.  Compound‐protein interaction prediction with end‐to‐end learning of neural networks for graphs and sequences , 2018, Bioinform..

[6]  Arzucan Özgür,et al.  DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..

[7]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[8]  Artem Cherkasov,et al.  SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines , 2017, Journal of Cheminformatics.

[9]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[10]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[11]  Yves Moreau,et al.  Highly Scalable Tensor Factorization for Prediction of Drug-Protein Interaction Type , 2015, NIPS 2015.

[12]  Yves Moreau,et al.  Macau: Scalable Bayesian Multi-relational Factorization with Side Information using MCMC , 2015, 1509.04610.

[13]  J. Guan,et al.  Improving compound-protein interaction prediction by building up highly credible negative samples. , 2015, Bioinformatics.

[14]  Xiang Zhang,et al.  Drug repositioning by integrating target information through a heterogeneous network model , 2014, Bioinform..

[15]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[16]  Tao Xu,et al.  Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis , 2014, J. Chem. Inf. Model..

[17]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[18]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[19]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[20]  Bart De Moor,et al.  eXtasy: variant prioritization by genomic data fusion , 2013, Nature Methods.

[21]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[22]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[24]  Afshan Srikumar,et al.  Drug Target Identification , 2012 .

[25]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[26]  Alexander A. Morgan,et al.  Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data , 2011, Science Translational Medicine.

[27]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[28]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[29]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[30]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[31]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[32]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[33]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[34]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[35]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[36]  Robert D. Finn,et al.  Predicting active site residue annotations in the Pfam database , 2007, BMC Bioinformatics.

[37]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[38]  D. Bojanic,et al.  Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development. , 2005, Drug discovery today.

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[40]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[41]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..