Adversarial Training for Privacy-Preserving Deep Learning Model Distribution

Collaboration among cancer registries is essential to develop accurate, robust, and generalizable deep learning models for automated information extraction from cancer pathology reports. Sharing data presents a serious privacy issue, especially in biomedical research and healthcare delivery domains. Distributing pretrained deep learning (DL) models has been proposed to avoid critical data sharing. However, there is growing recognition that collaboration among clinical institutes through DL model distribution exposes new security and privacy vulnerabilities. These vulnerabilities increase in natural language processing (NLP) applications, in which the dataset vocabulary with word vector representations needs to be associated with the other model parameters. In this paper, we propose a novel privacy-preserving DL model distribution across cancer registries for information extraction from cancer pathology reports with privacy and confidentiality considerations. The proposed approach exploits the adversarial training framework to distinguish private features from shared features among different datasets. It only shares registry-invariant model parameters, without sharing raw data nor registry-specific model parameters among cancer registries. Thus, it protects both the data and the trained model simultaneously. We compare our proposed approach to single-registry models, and a model trained on centrally hosted data from different cancer registries. The results show that the proposed approach significantly outperforms the single-registry models and achieves statistically indistinguishable micro and macro F1-score as compared to the centralized model.

[1]  Xiaoqian Jiang,et al.  A privacy-preserving distributed filtering framework for NLP artifacts , 2019, BMC Medical Informatics and Decision Making.

[2]  Hong-Jun Yoon,et al.  Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports , 2018, 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[3]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[4]  Ramesh Raskar,et al.  Split learning for health: Distributed deep learning without sharing raw patient data , 2018, ArXiv.

[5]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[6]  Zachariah Zhang,et al.  Deep EHR: Chronic Disease Prediction Using Medical Notes , 2018, MLHC.

[7]  Hong-Jun Yoon,et al.  Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports , 2019, 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[8]  Bruce R. Rosen,et al.  Distributed deep learning networks among institutions for medical imaging , 2018, J. Am. Medical Informatics Assoc..

[9]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[10]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[11]  Samy Bengio,et al.  Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[12]  Michele Filannino,et al.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1. , 2017, Journal of biomedical informatics.

[13]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[14]  Ramesh Raskar,et al.  No Peek: A Survey of private distributed deep learning , 2018, ArXiv.

[15]  Franck Dernoncourt,et al.  De-identification of patient notes with recurrent neural networks , 2016, J. Am. Medical Informatics Assoc..

[16]  John X. Qiu,et al.  Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks , 2019, J. Am. Medical Informatics Assoc..

[17]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning via Additively Homomorphic Encryption , 2018, IEEE Transactions on Information Forensics and Security.

[18]  M. Hepple,et al.  Identifying Personal Health Information Using Support Vector Machines , 2006 .

[19]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[20]  Rui Dai,et al.  Classifying medical relations in clinical text via convolutional neural networks , 2018, Artif. Intell. Medicine.

[21]  Hong-Jun Yoon,et al.  Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports , 2018, IEEE Journal of Biomedical and Health Informatics.

[22]  Shang Gao,et al.  Classifying cancer pathology reports with hierarchical self-attention networks , 2019, Artif. Intell. Medicine.

[23]  Spyros Kotoulas,et al.  Medical Text Classification using Convolutional Neural Networks , 2017, Studies in health technology and informatics.