ciDATGAN: Conditional Inputs for Tabular GANs

Conditionality has become a core component for Generative Adversarial Networks (GANs) for generating synthetic images. GANs are usually using latent conditionality to control the generation process. However, tabular data only contains manifest variables. Thus, latent conditionality either restricts the generated data or does not produce sufficiently good results. Therefore, we propose a new methodology to include conditionality in tabular GANs inspired by image completion methods. This article presents ciDATGAN, an evolution of the Directed Acyclic Tabular GAN (DATGAN) that has already been shown to outperform state-of-the-art tabular GAN models. First, we show that the addition of conditional inputs does hinder the model’s performance compared to its predecessor. Then, we demonstrate that ciDATGAN can be used to unbias datasets with the help of well-chosen conditional inputs. Finally, it shows that ciDATGAN can learn the logic behind the data and, thus, be used to complete large synthetic datasets using data from a smaller feeder dataset.

[1]  D. MacKenzie,et al.  Generative population synthesis for joint household and individual characteristics , 2022, Comput. Environ. Urban Syst..

[2]  M. Bierlaire,et al.  DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data , 2022, ArXiv.

[3]  Robert Birke,et al.  CTAB-GAN: Effective Table Data Synthesizing , 2021, ACML.

[4]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[5]  Bilal Farooq,et al.  Composite Travel Generative Adversarial Networks for Tabular and Sequential Population Synthesis , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6]  Lei Xu,et al.  Modeling Tabular data using Conditional GAN , 2019, NeurIPS.

[7]  Jianfei Cai,et al.  Pluralistic Image Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Mehran Ebrahimi,et al.  EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning , 2019, ArXiv.

[9]  Lei Xu,et al.  Synthesizing Tabular Data using Generative Adversarial Networks , 2018, ArXiv.

[10]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[11]  Ying Jin,et al.  Recreating passenger mode choice-sets for transport simulation: A case study of London, UK , 2018 .

[12]  Sushil Jajodia,et al.  Data Synthesis based on Generative Adversarial Networks , 2018, Proc. VLDB Endow..

[13]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[14]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[15]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[16]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[17]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[18]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[19]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[21]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[23]  Kay W. Axhausen,et al.  Hierarchical IPF: Generating a synthetic population for Switzerland , 2011 .

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.