Learning Vine Copula Models For Synthetic Data Generation

A vine copula model is a flexible high-dimensional dependence model which uses only bivariate building blocks. However, the number of possible configurations of a vine copula grows exponentially as the number of variables increases, making model selection a major challenge in development. In this work, we formulate a vine structure learning problem with both vector and reinforcement learning representation. We use neural network to find the embeddings for the best possible vine model and generate a structure. Throughout experiments on synthetic and real-world datasets, we show that our proposed approach fits the data better in terms of log-likelihood. Moreover, we demonstrate that the model is able to generate high-quality samples in a variety of applications, making it a good candidate for synthetic data generation.

[1]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[2]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .

[3]  Claudia Czado,et al.  Selection of sparse vine copulas in high dimensions with the Lasso , 2019, Stat. Comput..

[4]  Zoubin Ghahramani,et al.  Gaussian Process Vine Copulas for Multivariate Dependence , 2013, ICML.

[5]  Dorota Kurowicka,et al.  Dependence Modeling: Vine Copula Handbook , 2010 .

[6]  Sanjay Jain,et al.  Issues in synthetic data generation for advanced manufacturing , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[7]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[8]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[9]  Donald MacKenzie,et al.  ‘The formula that killed Wall Street’: The Gaussian copula and modelling practices in investment banking , 2014, Social studies of science.

[10]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[11]  Gal Elidan,et al.  Copula Bayesian Networks , 2010, NIPS.

[12]  Yi Shi,et al.  Synthetic Social Media Data Generation , 2018, IEEE Transactions on Computational Social Systems.

[13]  H. Joe Families of $m$-variate distributions with given margins and $m(m-1)/2$ bivariate dependence parameters , 1996 .

[14]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[15]  M Ganjali,et al.  A Copula Approach to Joint Modeling of Longitudinal Measurements and Survival Times Using Monte Carlo Expectation-Maximization with Application to AIDS Studies , 2015, Journal of biopharmaceutical statistics.

[16]  Tuo Zhao,et al.  CODA: high dimensional copula discriminant analysis , 2013, J. Mach. Learn. Res..

[17]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[18]  R. Nelsen An Introduction to Copulas , 1998 .

[19]  Claudia Czado,et al.  Bayesian model selection of regular vine copulas , 2017 .

[20]  Roger M. Cooke,et al.  Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines , 2001, Annals of Mathematics and Artificial Intelligence.

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Mani B. Srivastava,et al.  SenseGen: A deep learning architecture for synthetic sensor data generation , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[23]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[24]  Roger M. Cooke,et al.  Sampling algorithms for generating joint uniform distributions using the vine-copula method , 2007, Comput. Stat. Data Anal..

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Bhanukiran Vinzamuri,et al.  Active Learning based Survival Regression for Censored Data , 2014, CIKM.

[27]  Kalyan Veeramachaneni,et al.  Copula Graphical Models for Wind Resource Estimation , 2015, IJCAI.

[28]  Roberto Santana,et al.  Vine copula classifiers for the mind reading problem , 2016, Progress in Artificial Intelligence.

[29]  Claudia Czado,et al.  Selecting and estimating regular vine copulae and application to financial returns , 2012, Comput. Stat. Data Anal..

[30]  C. Czado,et al.  Bayesian inference for multivariate copulas using pair-copula constructions. , 2010 .

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Mehrdad Sabetzadeh,et al.  Synthetic data generation for statistical testing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[33]  Dacheng Tao,et al.  Multi-task copula by sparse graph regression , 2014, KDD.

[34]  Xiaoqian Jiang,et al.  DPSynthesizer: Differentially Private Data Synthesizer for Privacy Preserving Data Sharing , 2014, Proc. VLDB Endow..

[35]  Lutz F. Gruber,et al.  Sequential Bayesian Model Selection of Regular Vine Copulas , 2015, 1512.00976.

[36]  Delhi Paiva,et al.  Copula-based regression models: A survey , 2009 .