Synthetic Test Data Generation Using Recurrent Neural Networks: A Position Paper

Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the public sectors, are not allowed to use production data for testing purposes due to privacy concerns. The alternatives are to use anonymized data, or synthetically generated data. In this paper, we elaborate on these alternatives and compare them in an industrial context. Further we focus on synthetic data generation and investigate the use of recurrent neural networks for this purpose. In our preliminary experiments, we were able to generate representative and highly accurate data using a recurrent neural network. These results open new research questions that we discuss here, and plan to investigate in our future research.

[1]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Johan Schalkwyk,et al.  Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Ruofei Zhang,et al.  DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks , 2017, KDD.

[5]  Lakhmi C. Jain,et al.  Recurrent Neural Networks: Design and Applications , 1999 .

[6]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[7]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[8]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[9]  Amitabha Mukerjee,et al.  Contextual RNN-GANs for Abstract Reasoning Diagram Generation , 2016, AAAI.

[10]  Su Ruan,et al.  Medical Image Synthesis with Context-Aware Generative Adversarial Networks , 2016, MICCAI.

[11]  Bowen Zhou,et al.  Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation , 2016, AAAI.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Mehrdad Sabetzadeh,et al.  Model-based simulation of legal policies: framework, tool support, and validation , 2016, Software & Systems Modeling.

[14]  Kalyan Veeramachaneni,et al.  The Synthetic Data Vault , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[15]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[16]  Junliang Wang,et al.  Bilateral LSTM: A Two-Dimensional Long Short-Term Memory Model With Multiply Memory Units for Short-Term Cycle Time Forecasting in Re-entrant Manufacturing Systems , 2018, IEEE Transactions on Industrial Informatics.

[17]  Sandeep Subramanian,et al.  Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[19]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[21]  Rajani Chulyadyo,et al.  Using Probabilistic Relational Models to generate synthetic spatial or non-spatial databases , 2018, 2018 12th International Conference on Research Challenges in Information Science (RCIS).

[22]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[23]  Mike Cohn,et al.  Succeeding with Agile: Software Development Using Scrum , 2009 .

[24]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.