Creation of Synthetic Data with Conditional Generative Adversarial Networks

The generation of synthetic data is becoming a fundamental task in the daily life of any organization due to new protection data laws that are emerging. Generative Adversarial Networks (GANs) and its variants have attracted many researchers in their research work due to its elegant theoretical basis and its great performance in the generation of new data [19]. The goal of synthetic data generation is to create data that will perform similarly to the original dataset for many analysis tasks, such as classification. The problem of GANs is that in a classification problem, GANs do not take class labels into account when generating new data, they treat it as another attribute. This research work has focused on the creation of new synthetic data from the “Default of Credit Card Clients” dataset with a Conditional Generative Adversarial Network (CGAN). CGANs are an extension of GANs where the class label is taken into account when the new data is generated. The performance of our results has been measured by comparing the results obtained with classification algorithms, both in the original dataset and in the data generated.

[1]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[2]  Christa Boer,et al.  Correlation Coefficients: Appropriate Use and Interpretation , 2018, Anesthesia and analgesia.

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Jimeng Sun,et al.  Generating Multi-label Discrete Electronic Health Records using Generative Adversarial Networks , 2017, ArXiv.

[5]  Hae-Young Kim Statistical notes for clinical researchers: covariance and correlation , 2018, Restorative dentistry & endodontics.

[6]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[7]  Christoph Meinel,et al.  Multi-Task Generative Adversarial Network for Handling Imbalanced Clinical Data , 2018, ArXiv.

[8]  Boi Faltings,et al.  Generating Differentially Private Datasets Using GANs , 2018, ArXiv.

[9]  Casey S. Greene,et al.  Privacy-preserving generative deep neural networks support clinical data sharing , 2017 .

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Pavlos Protopapas,et al.  T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular Sampling , 2018, ArXiv.

[12]  Philip Sedgwick,et al.  Pearson’s correlation coefficient , 2012, BMJ : British Medical Journal.

[13]  Mihaela van der Schaar,et al.  PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees , 2018, ICLR.