Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.

[1]  Terry Lyons,et al.  A Data-Driven Market Simulator for Small Data Environments , 2020, SSRN Electronic Journal.

[2]  Gautier Marti,et al.  CORRGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jie Chen,et al.  Time Series Simulation by Conditional Generative Adversarial Net , 2019, International Journal of Neural Networks and Advanced Applications.

[4]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.

[5]  Mario Fritz,et al.  GAN-Leaks: A Taxonomy of Membership Inference Attacks against GANs , 2019, ArXiv.

[6]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[7]  Ju Ren,et al.  GANobfuscator: Mitigating Information Leakage Under GAN via Differential Privacy , 2019, IEEE Transactions on Information Forensics and Security.

[8]  Anderson Santana de Oliveira,et al.  Differentially Private Generative Adversarial Networks for Time Series, Continuous, and Discrete Open Data , 2019, SEC.

[9]  Mihaela van der Schaar,et al.  PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees , 2018, ICLR.

[10]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[11]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[12]  Mihaela van der Schaar,et al.  Time-series Generative Adversarial Networks , 2019, NeurIPS.

[13]  Edvin Listo Zec,et al.  Recurrent Conditional GANs for Time Series Sensor Modelling , 2019 .

[14]  Ben Y. Zhao,et al.  Predictive Analysis in Network Function Virtualization , 2018, Internet Measurement Conference.

[15]  Homa , 2018, Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication.

[16]  Feng Liu,et al.  AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization , 2018, SIGCOMM.

[17]  Kilian Q. Weinberger,et al.  An empirical study on evaluation metrics of generative adversarial networks , 2018, ArXiv.

[18]  Giancarlo Mauri,et al.  GAN-based synthetic brain MR image generation , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[19]  Fei Wang,et al.  Differentially Private Generative Adversarial Network , 2018, ArXiv.

[20]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[21]  Hayit Greenspan,et al.  Synthetic data augmentation using GAN for improved liver lesion classification , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[22]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[23]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[24]  Yi-Hsuan Yang,et al.  MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment , 2017, AAAI.

[25]  Zhiwei Steven Wu,et al.  Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing , 2017, bioRxiv.

[26]  Yun Feng,et al.  Challenges in inferring internet congestion using throughput measurements , 2017, Internet Measurement Conference.

[27]  Nick Feamster,et al.  Characterizing and Improving the Reliability of Broadband Internet Access , 2017, ArXiv.

[28]  John T. Guibas,et al.  Synthetic Medical Images from Dual Generative Adversarial Networks , 2017, ArXiv.

[29]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[30]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[31]  Yi Zhang,et al.  Do GANs actually learn the distribution? An empirical study , 2017, ArXiv.

[32]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[33]  Charles A. Sutton,et al.  VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning , 2017, NIPS.

[34]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[35]  Jimeng Sun,et al.  Generating Multi-label Discrete Patient Records using Generative Adversarial Networks , 2017, MLHC.

[36]  Qinru Qiu,et al.  A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[37]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[38]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[39]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[40]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[41]  Olof Mogren,et al.  C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[42]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[43]  Vyas Sekar,et al.  Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection , 2016, SIGCOMM.

[44]  Vladimir Getov,et al.  AGOCS — Accurate Google Cloud Simulator Framework , 2016, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld).

[45]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[46]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[47]  Vyas Sekar,et al.  CFA: A Practical Prediction System for Video QoE Optimization , 2016, NSDI.

[48]  Mariacarla Calzarossa,et al.  Workload Characterization , 2016, ACM Comput. Surv..

[49]  Erez Zadok,et al.  Filebench: A Flexible Framework for File System Benchmarking , 2016, login Usenix Mag..

[50]  Harrison H. Zhou,et al.  Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[51]  Franck Cappello,et al.  GloudSim: Google trace based cloud simulator with virtual machines , 2015, Softw. Pract. Exp..

[52]  Rajkumar Buyya,et al.  Workload modeling for resource usage analysis and simulation in cloud computing , 2015, Comput. Electr. Eng..

[53]  Ralph Roskies,et al.  Bridges: a uniquely flexible HPC resource for new communities and data analytics , 2015, XSEDE.

[54]  Xue Liu,et al.  BURSE: A Bursty and Self-Similar Workload Generator for Cloud Computing , 2015, IEEE Transactions on Parallel and Distributed Systems.

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  Jason Liu,et al.  Cluster-Based Spatiotemporal Background Traffic Generation for Network Simulation , 2014, ACM Trans. Model. Comput. Simul..

[57]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2015, SIGCOMM.

[58]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[59]  Ahmed Elmokashfi,et al.  Measuring the Reliability of Mobile Broadband Networks , 2014, Internet Measurement Conference.

[60]  Jie Lu,et al.  Web-Page Recommendation Based on Web Usage and Domain Knowledge , 2014 .

[61]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[62]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[63]  Mahmoud Maqableh,et al.  Job Scheduling for Cloud Computing Using Neural Networks , 2014 .

[64]  Franck Cappello,et al.  Characterizing and modeling cloud applications/jobs on a Google data center , 2014, The Journal of Supercomputing.

[65]  Christos Faloutsos,et al.  Beyond Poisson: Modeling Inter-Arrival Time of Requests in a Datacenter , 2014, PAKDD.

[66]  Jie Xu,et al.  Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud , 2014, IEEE Transactions on Cloud Computing.

[67]  Keqiang He,et al.  Next stop, the cloud: understanding modern web service deployment in EC2 and azure , 2013, Internet Measurement Conference.

[68]  H. Vincent Poor,et al.  Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach , 2011, IEEE Transactions on Information Forensics and Security.

[69]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[70]  Joseph L. Hellerstein,et al.  Obfuscatory obscanturism: Making workload traces of commercially-sensitive systems safe to release , 2012, 2012 IEEE Network Operations and Management Symposium.

[71]  Bu-Sung Lee,et al.  Optimization of Resource Provisioning Cost in Cloud Computing , 2012, IEEE Transactions on Services Computing.

[72]  R. Weisberg A-N-D , 2011 .

[73]  Paul Barford,et al.  Efficient network-wide flow record generation , 2011, 2011 Proceedings IEEE INFOCOM.

[74]  Shuzhong Shi,et al.  Estimating High Dimensional Covariance Matrices and its Applications , 2011 .

[75]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[76]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[77]  Tony McGregor,et al.  The RIPE NCC Internet Measurement Data Repository , 2010, PAM.

[78]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[79]  Mohammad Reza Meybodi,et al.  Effective page recommendation algorithms based on distributed learning automata and weighted association rules , 2010, Expert Syst. Appl..

[80]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[81]  Dirk Grunwald,et al.  Physical Layer Attacks on Unlinkability in Wireless LANs , 2009, Privacy Enhancing Technologies.

[82]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[83]  Amin Vahdat,et al.  Swing: Realistic and Responsive Network Traffic Generation , 2009, IEEE/ACM Transactions on Networking.

[84]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[85]  Teerawat Issariyakul,et al.  Introduction to Network Simulator NS2 , 2008 .

[86]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[87]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[88]  Cheng-Zhong Xu,et al.  Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[89]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[90]  Michele C. Weigle,et al.  Tmix: a tool for generating realistic TCP application workloads in ns-2 , 2006, CCRV.

[91]  A.M. Gonzalez,et al.  Modeling and forecasting electricity prices with input/output hidden Markov models , 2005, IEEE Transactions on Power Systems.

[92]  Paul Barford,et al.  Self-configuring network traffic generation , 2004, IMC '04.

[93]  Vinod Yegneswaran,et al.  A framework for malicious workload generation , 2004, IMC '04.

[94]  Denis Trystram,et al.  A synthetic workload generator for cluster computing , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[95]  Evangelos P. Markatos,et al.  Generating realistic workloads for network intrusion detection systems , 2004, WOSP '04.

[96]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[97]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[98]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[99]  Benjamin Melamed,et al.  Modeling full-length VBR video using Markov-renewal-modulated TES models , 1998, IEEE J. Sel. Areas Commun..

[100]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[101]  B. Melamed,et al.  Applications of the Tes Modeling Methodology , 1993, Proceedings of 1993 Winter Simulation Conference - (WSC '93).

[102]  Benjamin Melamed,et al.  An Overview of Tes Processes and Modeling Methodology , 1993, Performance/SIGMETRICS Tutorials.

[103]  David Goldsman,et al.  The TES methodology: modeling empirical stationary time series , 1992, WSC '92.

[104]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[105]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.