Generative Synthesis of Insurance Datasets

One of the impediments in advancing actuarial research and developing open source assets for insurance analytics is the lack of realistic publicly available datasets. In this work, we develop a workflow for synthesizing insurance datasets leveraging CTGAN, a recently proposed neural network architecture for generating tabular data. Applying the proposed workflow to publicly available data in the domains of general insurance pricing and life insurance shock lapse modeling, we evaluate the synthesized datasets from a few perspectives: machine learning efficacy, distributions of variables, and stability of model parameters. This workflow is implemented via an R interface to promote adoption by researchers and data owners.

[1]  Alexander Noll,et al.  Case Study: French Motor Third-Party Liability Claims , 2018 .

[2]  Sushil Jajodia,et al.  Data Synthesis based on Generative Adversarial Networks , 2018, Proc. VLDB Endow..

[3]  Mario Fritz,et al.  GAN-Leaks: A Taxonomy of Membership Inference Attacks against Generative Models , 2019, CCS.

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[6]  Lei Xu,et al.  Modeling Tabular data using Conditional GAN , 2019, NeurIPS.

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Mihaela van der Schaar,et al.  PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees , 2018, ICLR.

[9]  Daniel Bernau,et al.  Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models , 2019, Proc. Priv. Enhancing Technol..

[10]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[11]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.

[12]  Jimeng Sun,et al.  Generating Multi-label Discrete Patient Records using Generative Adversarial Networks , 2017, MLHC.

[13]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[14]  Mario V. Wuthrich,et al.  Data Analytics for Non-Life Insurance Pricing , 2019 .

[15]  Andrea Gabrielli,et al.  An Individual Claims History Simulation Machine , 2018 .

[16]  Ronald Richman,et al.  AI in Actuarial Science , 2018 .

[17]  Daniel Bernau,et al.  Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models , 2019, Proc. Priv. Enhancing Technol..

[18]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[19]  Fabrizio Durante,et al.  Computational Actuarial Science with R , 2015 .

[20]  Mario Fritz,et al.  GAN-Leaks: A Taxonomy of Membership Inference Attacks against GANs , 2019, ArXiv.

[21]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.