FL-QSAR: a federated learning based QSAR prototype for collaborative drug discovery

Motivation Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery. Results For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis. A prototype platform of federated-learning-based QSAR modeling for collaborative drug discovery, i.e, FL-QSAR, is presented accordingly. We first compared the HFL framework with a classic privacy-preserving computation framework, i.e., secure multiparty computation (MPC) to indicate its difference from various perspective. Then we compared FL-QSAR with the public collaboration in terms of QSAR modeling. Our extensive experiments demonstrated that (1) collaboration by FL-QSAR outperforms a single client using only its private data, and (2) collaboration by FL-QSAR achieves almost the same performance as that of collaboration via cleartext learning algorithms using all shared information. Taking together, our results indicate that FL-QSAR under the HFL framework provides an efficient solution to break the barriers between pharmaceutical institutions in QSAR modeling, therefore promote the development of collaborative and privacy-preserving drug discovery with extendable ability to other privacy-related biomedical areas. Availability and implementation The source codes of the federated learning simulation and FL-QSAR are available on the GitHub: https://github.com/bm2-lab/FL-QSAR

[1]  David J. Wu,et al.  Secure genome-wide association analysis using multiparty computation , 2018, Nature Biotechnology.

[2]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[3]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[4]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[5]  Dan Boneh,et al.  Deriving genomic diagnoses without revealing patient genomes , 2017, Science.

[6]  Lydia de la Torre,et al.  A Guide to the California Consumer Privacy Act of 2018 , 2018 .

[7]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.

[8]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[9]  Ranadip Pal,et al.  IntegratedMRF: random forest‐based framework for integrating prediction from different data types , 2017, Bioinform..

[10]  Bo Zhu,et al.  MR fingerprinting Deep RecOnstruction NEtwork (DRONE) , 2017, Magnetic resonance in medicine.

[11]  Jin Gu,et al.  VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder , 2018, Genom. Proteom. Bioinform..

[12]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[13]  Paul Voigt,et al.  Introduction and ‘Checklist’ , 2017 .

[14]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[15]  Bonnie Berger,et al.  Realizing private and practical pharmacological collaboration , 2018, Science.

[16]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[17]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[18]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[19]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[20]  Anant Madabhushi,et al.  Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent , 2017, Scientific Reports.

[21]  Paul Voigt,et al.  The Eu General Data Protection Regulation (Gdpr): A Practical Guide , 2017 .

[22]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[23]  Jianyang Zeng,et al.  Secure multiparty computation for privacy-preserving drug discovery , 2020, Bioinform..

[24]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.