Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data

Generative Adversarial Networks (GANs) are typically trained to synthesize data, from images and more recently tabular data, under the assumption of directly accessible training data. Recently, federated learning (FL) is an emerging paradigm that features decentralized learning on client’s local data with a privacy-preserving capability. And, while learning GANs to synthesize images on FL systems has just been demonstrated, it is unknown if GANs for tabular data can be learned from decentralized data sources. Moreover, it remains unclear which distributed architecture suits them best. Different from image GANs, state-of-the-art tabular GANs require prior knowledge on the data distribution of each (discrete and continuous) column to agree on a common encoding – risking privacy guarantees. In this paper, we propose Fed-TGAN, the first Federated learning framework for Tabular GANs. To effectively learn a complex tabular GAN on non-identical participants, Fed-TGAN designs two novel features: (i) a privacy-preserving multi-source feature encoding for model initialization; and (ii) table similarity aware weighting strategies to aggregate local models for countering data skew. We extensively evaluate the proposed Fed-TGAN against variants of decentralized learning architectures on four widely used datasets. Results show that Fed-TGAN accelerates training time per epoch up to 200% compared to the alternative architectures, for both IID and Non-IID data. Overall, Fed-TGAN not only stabilizes the training loss, but also achieves better similarity between generated and original data.

[1]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sercan O. Arik,et al.  TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.

[4]  Maria Vouis Our data , 2019, Accounting, Auditing & Accountability Journal.

[5]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[6]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[7]  Tao Sun,et al.  FedGAN: Federated Generative Adversarial Networks for Distributed Data , 2020, ArXiv.

[8]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[9]  Bruno Sericola,et al.  MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[10]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[11]  Lei Xu,et al.  Modeling Tabular data using Conditional GAN , 2019, NeurIPS.

[12]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[13]  Pierrick Bruneau,et al.  Parameter-based reduction of Gaussian mixture models with a variational-Bayes approach , 2008, 2008 19th International Conference on Pattern Recognition.

[14]  Mihaela van der Schaar,et al.  PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees , 2018, ICLR.

[15]  Anne-Marie Kermarrec,et al.  FeGAN: Scaling Distributed GANs , 2020, Middleware.

[16]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[17]  Robert Birke,et al.  CTAB-GAN: Effective Table Data Synthesizing , 2021, ACML.

[18]  Yikai Zhang,et al.  Learn distributed GAN with Temporary Discriminators , 2020, ECCV.

[19]  Sara Bouchenak,et al.  An Exploratory Analysis on Users’ Contributions in Federated Learning , 2020, 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA).

[20]  Craig Gentry,et al.  Computing arbitrary functions of encrypted data , 2010, CACM.

[21]  Dimitris N. Metaxas,et al.  Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[23]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[24]  Stanislau Semeniuta,et al.  On Accurate Evaluation of GANs for Language Generation , 2018, ArXiv.

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Sushil Jajodia,et al.  Data Synthesis based on Generative Adversarial Networks , 2018, Proc. VLDB Endow..