Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary

A major roadblock in machine learning for healthcare is the inability of data to be shared broadly, due to privacy concerns. Privacy preserving synthetic data generation is increasingly being seen as a solution to this problem. However, since healthcare data often has significant site-specific biases, it has motivated the use of federated learning when the goal is to utilize data from multiple sites for machine learning model training. Here, we introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary), a generative mechanism enabling collaborative learning. It is a generalized extension of the (local) PrivGAN mechanism allowing to take into account the diversity (non-IID) nature of the federated sites. In particular, we show how a site with limited and biased data could benefit from other sites while keeping data from all the sources private. FELICIA works for a large family of Generative Adversarial Networks (GAN) architectures including vanilla and conditional GANs as demonstrated in this work. We show that by using the FELICIA mechanism, a site with a limited amount of images can generate high-quality synthetic images with improved utility, while none of the sites need to provide access to their real data. The sharing happens solely through a central discriminator with access limited to synthetic data. We demonstrate these benefits on several realistic healthcare scenarios using benchmark image datasets (MNIST, CIFAR-10) as well as on medical images for the task of skin lesion classification. We show that the utility of synthetic images generated by FELICIA surpasses that of the data available locally and we demonstrate that it can correct the reduced utility of a biased subgroup within a class.

[1]  Zhiwei Steven Wu,et al.  Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing , 2017, bioRxiv.

[2]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[3]  Ping Liu,et al.  Federated Generative Adversarial Learning , 2020, PRCV.

[4]  Juan Lavista Ferres,et al.  privGAN: Protecting GANs from membership inference attacks at low cost , 2019 .

[5]  Mihaela van der Schaar,et al.  PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees , 2018, ICLR.

[6]  Boi Faltings,et al.  Federated Generative Privacy , 2019, IEEE Intelligent Systems.

[7]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[9]  Yiping Wang,et al.  Synthesis of diagnostic quality cancer pathology images , 2020, bioRxiv.

[10]  Tao Sun,et al.  FedGAN: Federated Generative Adversarial Networks for Distributed Data , 2020, ArXiv.

[11]  Micah J. Sheller,et al.  The future of digital health with federated learning , 2020, npj Digital Medicine.

[12]  Christian Wachinger,et al.  Detect and Correct Bias in Multi-Site Neuroimaging Datasets , 2020, Medical Image Anal..

[13]  Raymond T Ng,et al.  Private data sharing between decentralized users through the privGAN architecture , 2020, 2020 IEEE 24th International Enterprise Distributed Object Computing Workshop (EDOCW).

[14]  Harald Kittler,et al.  Descriptor : The HAM 10000 dataset , a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018 .

[15]  R. Socher,et al.  Deep learning-enabled medical computer vision , 2021, npj Digital Medicine.

[16]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[17]  MACE: A Flexible Framework for Membership Privacy Estimation in Generative Models , 2020, ArXiv.

[18]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[19]  Emiliano De Cristofaro,et al.  LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks , 2017, ArXiv.

[20]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[21]  David Page,et al.  Differential Privacy for Classifier Evaluation , 2015, AISec@CCS.

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[24]  Bruno Sericola,et al.  MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Kunal Talwar,et al.  Private selection from private candidates , 2018, STOC.

[26]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[27]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[30]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[31]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[32]  Yiping Wang,et al.  Synthesis of diagnostic quality cancer pathology images by generative adversarial networks , 2020, The Journal of pathology.