论文信息 - Generative Adversarial Networks for Synthetic Data Generation: A Comparative Study

Generative Adversarial Networks for Synthetic Data Generation: A Comparative Study

Generative Adversarial Networks (GANs) are gaining increasing attention as a means for synthesising data. So far much of this work has been applied to use cases outside of the data confidentiality domain with a common application being the production of artificial images. Here we consider the potential application of GANs for the purpose of generating synthetic census microdata. We employ a battery of utility metrics and a disclosure risk metric (the Targeted Correct Attribution Probability) to compare the data produced by tabular GANs with those produced using orthodox data synthesis methods.

[1] L. Cox. Statistical Disclosure Limitation , 2006 .

[2] Hayit Greenspan,et al. GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification , 2018, Neurocomputing.

[3] Mark Elliot,et al. End User Licence to Open Government Data? A Simulated Penetration Attack on Two Social Survey Datasets , 2016 .

[4] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[6] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[8] Mehran Ebrahimi,et al. Image Colorization Using Generative Adversarial Networks , 2018, AMDO.

[9] Gözde B. Ünal,et al. Patch-Based Image Inpainting with Generative Adversarial Networks , 2018, ArXiv.

[10] Lei Xu,et al. Synthesizing Tabular Data using Generative Adversarial Networks , 2018, ArXiv.

[11] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[12] Jerome P. Reiter,et al. Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[13] Jun Zhang,et al. PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[14] Jerome P. Reiter,et al. Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[15] Sushil Jajodia,et al. FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data , 2019, IJCAI.

[16] Jerome P. Reiter,et al. Sampling With Synthesis: A New Approach for Releasing Public Use Census Microdata , 2010 .

[17] Mark Elliot,et al. The Impact of Synthetic Data Generation on Data Utility with Application to the 1991 UK Samples of Anonymised Records , 2020, Trans. Data Priv..

[18] Jimeng Sun,et al. Generating Multi-label Discrete Patient Records using Generative Adversarial Networks , 2017, MLHC.

[19] Ke Yan,et al. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks , 2019, Scientific Reports.

[20] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[21] Jerome P. Reiter,et al. Using CART to generate partially synthetic public use microdata , 2005 .

[22] Bill Howe,et al. DataSynthesizer: Privacy-Preserving Synthetic Datasets , 2017, SSDBM.

[23] Maria Pampaka,et al. Differential Correct Attribution Probability for Synthetic Data: An Exploration , 2018, PSD.

[24] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Jerome P. Reiter,et al. Satisfying Disclosure Restrictions With Synthetic Data Sets , 2002 .

[26] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[27] Joshua Snoke,et al. General and specific utility measures for synthetic data , 2016, 1604.06651.

[28] Cecilio Angulo,et al. Generating Synthetic ECGs Using GANs for Anonymizing Healthcare Data , 2021 .

[29] Jörg Drechsler,et al. An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets , 2011, Comput. Stat. Data Anal..

[30] Gillian M. Raab,et al. synthpop: Bespoke Creation of Synthetic Data in R , 2016 .

[31] Linda Coyle,et al. Generation and evaluation of synthetic patient data , 2020, BMC Medical Research Methodology.

[32] Robert Birke,et al. CTAB-GAN: Effective Table Data Synthesizing , 2021, ACML.

[33] Cynthia Rudin,et al. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[35] G. Raab,et al. Guidelines for Producing Useful Synthetic Data , 2017, 1712.04078.

[36] Lei Xu,et al. Modeling Tabular data using Conditional GAN , 2019, NeurIPS.

[37] Wei Chen,et al. A State-of-the-Art Review on Image Synthesis With Generative Adversarial Networks , 2020, IEEE Access.

[38] Anna Oganian,et al. Global Measures of Data Utility for Microdata Masked for Disclosure Limitation , 2009, J. Priv. Confidentiality.

[39] Anna Oganian,et al. A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality , 2006 .

[40] Talha Iqbal,et al. Generative Adversarial Network for Medical Images (MI-GAN) , 2018, Journal of Medical Systems.

[41] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Sushil Jajodia,et al. Data Synthesis based on Generative Adversarial Networks , 2018, Proc. VLDB Endow..