Multivariate Time Series Synthesis Using Generative Adversarial Networks

Collection and analysis of distributed (cloud) computing workloads allows for a deeper understanding of user and system behavior and is necessary for efficient operation of infrastructures and applications. The availability of such workload data is however often limited as most cloud infrastructures are commercially operated and monitoring data is considered proprietary or falls under GPDR regulations. This work investigates the generation of synthetic workloads using Generative Adversarial Networks and addresses a current need for more data and better tools for workload generation. Resource utilization measurements such as the utilization rates of Content Delivery Network (CDN) caches are generated and a comparative evaluation pipeline using descriptive statistics and time-series analysis is developed to assess the statistical similarity of generated and measured workloads. We use CDN data open sourced by us in a data generation pipeline as well as back-end ISP workload data to demonstrate the multivariate synthesis capability of our approach. The work contributes a generation method for multivariate time series workload generation that can provide arbitrary amounts of statistically similar data sets based on small subsets of real data. The presented technique shows promising results, in particular for heterogeneous workloads not too irregular in temporal behavior.

[1]  Gunnar Rätsch,et al.  Multivariate Time Series Imputation with Variational Autoencoders , 2019, ArXiv.

[2]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[3]  Xifeng Yan,et al.  Workload characterization and prediction in the cloud: A multiple time series approach , 2012, 2012 IEEE Network Operations and Management Symposium.

[4]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[5]  Jerome A. Rolia,et al.  Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[6]  Olof Mogren,et al.  C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[7]  Mani B. Srivastava,et al.  SenseGen: A deep learning architecture for synthetic sensor data generation , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[8]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[9]  J. Sola,et al.  Importance of input data normalization for the application of neural networks to complex industrial problems , 1997 .

[10]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[11]  Robert H. Shumway,et al.  Characteristics of Time Series , 2011 .

[12]  Tomas E. Ward,et al.  Quick and Easy Time Series Generation with Established Image-based GANs , 2019, ArXiv.

[13]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[14]  Guanying Wang,et al.  Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[15]  Kishore Kulat,et al.  A novel imputation methodology for time series based on pattern sequence forecasting , 2018, Pattern Recognit. Lett..

[16]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[17]  Jörg Domaschka,et al.  Reliable capacity provisioning for distributed cloud/edge/fog computing applications , 2017, 2017 European Conference on Networks and Communications (EuCNC).

[18]  Ana Pont,et al.  Workload Generators for Web-Based Systems: Characteristics, Current Status, and Challenges , 2018, IEEE Communications Surveys & Tutorials.

[19]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[20]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[21]  Stefan Wesner,et al.  Unified Container Environments for Scientific Cluster Scenarios , 2019 .

[22]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[23]  Andrzej Kochut,et al.  On Strategies for Dynamic Resource Management in Virtualized Server Environments , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[24]  G. Fanti,et al.  Generating High-fidelity, Synthetic Time Series Datasets with DoppelGANger , 2019, ArXiv.

[25]  João Mendes Moreira,et al.  A General Introduction to Data Analytics , 2018 .

[26]  Byung Sam Yoo,et al.  Seasonal integration and cointegration , 1990 .

[27]  Magnus Wiese,et al.  Quant GANs: deep generation of financial time series , 2019, Quantitative Finance.

[28]  Dominik Moritz,et al.  Visualizing a Million Time Series with the Density Line Chart , 2018, ArXiv.

[29]  Jörg Domaschka,et al.  Done Yet? A Critical Introspective of the Cloud Management Toolbox , 2018, 2018 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC).

[30]  Richard Wolski,et al.  Using Parametric Models to Represent Private Cloud Workloads , 2014, IEEE Transactions on Services Computing.

[31]  Sanja Fidler,et al.  Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Per-Olov Östberg,et al.  Workload Diffusion Modeling for Distributed Applications in Fog/Edge Computing Environments , 2020, ICPE.

[33]  Søren Johansen,et al.  Cointegration in partial systems and the efficiency of single-equation analysis , 1992 .

[34]  Carey L. Williamson,et al.  ProWGen: a synthetic workload generation tool for simulation evaluation of web proxy caches , 2002, Comput. Networks.