Secure multiparty computation for privacy-preserving drug discovery

MOTIVATION Quantitative structure-activity relationship (QSAR) and drug-target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery. RESULTS We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance, and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature, and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery. AVAILABILITY AND IMPLEMENTATION The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Daisuke Kitagawa,et al.  Activity‐based kinase profiling of approved tyrosine kinase inhibitors , 2013, Genes to cells : devoted to molecular & cellular mechanisms.

[2]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[3]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[4]  Peter Rindal,et al.  ABY3: A Mixed Protocol Framework for Machine Learning , 2018, IACR Cryptol. ePrint Arch..

[5]  Xiaodong Lin,et al.  Secure analysis of distributed chemical databases without data integration , 2005, J. Comput. Aided Mol. Des..

[6]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[7]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[8]  David J. Wu,et al.  Secure genome-wide association analysis using multiparty computation , 2018, Nature Biotechnology.

[9]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[11]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[12]  Bonnie Berger,et al.  Exploiting ontology graph for predicting sparsely annotated gene function , 2015, Bioinform..

[13]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Rajkumar Roy,et al.  Applications of Soft Computing , 2009 .

[15]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[16]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[17]  William B. Langdon,et al.  Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development , 2006 .

[18]  Xiang Zhang,et al.  Drug repositioning by integrating target information through a heterogeneous network model , 2014, Bioinform..

[19]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[20]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[21]  Tao Jiang,et al.  NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions , 2018, bioRxiv.

[22]  Igor V. Tetko,et al.  BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry , 2016, Molecular informatics.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Philip Seeman,et al.  Radioreceptor Binding Profile of the Atypical Antipsychotic Olanzapine , 1996, Neuropsychopharmacology.

[25]  MeiJian-Ping,et al.  Drug–target interaction prediction by learning from local information and neighbors , 2013 .

[26]  Ling-Yun Wu,et al.  Semi-supervised Drug-Protein Interaction Prediction from Heterogeneous Spaces , 2009 .

[27]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[28]  Bonnie Berger,et al.  Realizing private and practical pharmacological collaboration , 2018, Science.

[29]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  J C Gertrudes,et al.  Machine learning techniques and drug design. , 2012, Current medicinal chemistry.

[32]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[33]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[34]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[35]  Matthias Schunter,et al.  Intel Software Guard Extensions: Introduction and Open Research Challenges , 2016, SPRO@CCS.

[36]  Dan Boneh,et al.  Deriving genomic diagnoses without revealing patient genomes , 2017, Science.

[37]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[38]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[39]  M. Shahid,et al.  Asenapine: a novel psychopharmacologic agent with a unique human receptor signature , 2009, Journal of psychopharmacology.

[40]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[41]  Wei Xu,et al.  PrivPy: General and Scalable Privacy-Preserving Data Mining , 2019, KDD.

[42]  Karl Köchert,et al.  BAY 43‐9006/Sorafenib blocks CSF1R activity and induces apoptosis in various classical Hodgkin lymphoma cell lines , 2011, British journal of haematology.

[43]  Jian Peng,et al.  A Network Integration Approach for Drug-Target Interaction Prediction and Computational Drug Repositioning from Heterogeneous Information , 2017, RECOMB 2017.

[44]  Jihoon Kim,et al.  PRINCESS: Privacy‐protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS , 2017, Bioinform..

[45]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[46]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[47]  Robert F Murphy,et al.  An active role for machine learning in drug development. , 2011, Nature chemical biology.

[48]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[49]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[50]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[51]  Damian Szklarczyk,et al.  STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data , 2015, Nucleic Acids Res..