Practical GAN-based synthetic IP header trace generation using NetShare

We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.

[1]  Abhradeep Thakurta,et al.  Toward Training at ImageNet Scale with Differential Privacy , 2022, ArXiv.

[2]  Huseyin A. Inan,et al.  Differentially Private Fine-tuning of Language Models , 2021, ICLR.

[3]  Zhi Xue,et al.  IDSGAN: Generative Adversarial Networks for Attack Generation against Intrusion Detection , 2018, PAKDD.

[4]  Vyas Sekar,et al.  On the Privacy Properties of GAN-generated Samples , 2022, AISTATS.

[5]  Nour Moustafa,et al.  A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets , 2021 .

[6]  Mahdi Soltanolkotabi,et al.  Understanding Overparameterization in Generative Adversarial Networks , 2021, ICLR.

[7]  Zhiwei Steven Wu,et al.  Leveraging Public Data for Practical Private Query Release , 2021, ICML.

[8]  Ninghui Li,et al.  PrivSyn: Differentially Private Data Synthesis , 2020, USENIX Security Symposium.

[9]  Nick Feamster,et al.  New Directions in Automated Traffic Analysis , 2020, CCS.

[10]  Sudsanguan Ngamsuriyaroj,et al.  Novel Bi-directional Flow-based Traffic Generation Framework for IDS Evaluation and Exploratory Data Analysis , 2021, J. Inf. Process..

[11]  Minlan Yu,et al.  Jaqen: A High-Performance Switch-Native Approach for Detecting and Mitigating Volumetric DDoS Attacks with Programmable Switches , 2021, USENIX Security Symposium.

[12]  Osu Nrotc,et al.  Harpoon , 2021, Encyclopedic Dictionary of Archaeology.

[13]  Manish Marwah,et al.  STAN: Synthetic Network Traffic Generation using Autoregressive Neural Models , 2020, ArXiv.

[14]  Thomas Steinke,et al.  New Oracle-Efficient Algorithms for Private Synthetic Data Release , 2020, ICML.

[15]  Nick Feamster,et al.  A Comparative Study of Network Traffic Representations for Novelty Detection , 2020, ArXiv.

[16]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[17]  Mohammad Ashiqur Rahman,et al.  G-IDS: Generative Adversarial Networks Assisted Intrusion Detection System , 2020, 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC).

[18]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[19]  Raef Bassily,et al.  Private Query Release Assisted by Public Data , 2020, ICML.

[20]  Steffen Haas,et al.  Zeek-Osquery: Host-Network Correlation for Advanced Monitoring and Intrusion Detection , 2020, SEC.

[21]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Pan Wang,et al.  PacketCGAN: Exploratory Study of Class Imbalance for Encrypted Traffic Classification Using CGAN , 2019, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[23]  G. Fanti,et al.  Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions , 2019, Internet Measurement Conference.

[24]  Nick Feamster,et al.  Inferring Streaming Video Quality from Encrypted Traffic: Practical Models and Deployment Experience , 2019, SIGMETRICS Perform. Evaluation Rev..

[25]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.

[26]  Changhee Choi,et al.  PcapGAN: Packet Capture File Generator by Style-Based Generative Adversarial Networks , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[27]  Adriel Cheng,et al.  PAC-GAN: Packet Generation of Network Traffic using Generative Adversarial Networks , 2019, 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON).

[28]  Radu State,et al.  SynGAN: Towards Generating Synthetic Network Attacks using GANs , 2019, ArXiv.

[29]  Roy Friedman,et al.  Nitrosketch: robust and general sketch-based monitoring in software switches , 2019, SIGCOMM.

[30]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[31]  Kuang-Ching Wang,et al.  The Design and Operation of CloudLab , 2019, USENIX ATC.

[32]  Lei Xu,et al.  Modeling Tabular data using Conditional GAN , 2019, NeurIPS.

[33]  Yiqiang Sheng,et al.  A Packet-Length-Adjustable Attention Model Based on Bytes Embedding Using Flow-WGAN for Smart Cybersecurity , 2019, IEEE Access.

[34]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andreas Hotho,et al.  Flow-based Network Traffic Generation using Generative Adversarial Networks , 2018, Comput. Secur..

[36]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[37]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[38]  Mihaela van der Schaar,et al.  Time-series Generative Adversarial Networks , 2019, NeurIPS.

[39]  Lingyu Wang,et al.  Preserving Both Privacy and Utility in Network Trace Anonymization , 2018, CCS.

[40]  Junhua Yan,et al.  Feature Selection for Website Fingerprinting , 2018, Proc. Priv. Enhancing Technol..

[41]  Peng Liu,et al.  Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.

[42]  Shengli Liu,et al.  An enhancing framework for botnet detection using generative adversarial networks , 2018, 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD).

[43]  Maria Rigaki,et al.  Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[44]  Roberto Therón,et al.  UGR'16: A new dataset for the evaluation of cyclostationarity-based network IDSs , 2018, Comput. Secur..

[45]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[46]  Andreas Hotho,et al.  IP2Vec: Learning Similarities Between IP Addresses , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[47]  Xin Jin,et al.  SketchVisor: Robust Network Measurement for Software Packet Processing , 2017, SIGCOMM.

[48]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[49]  Yi Zhang,et al.  Do GANs actually learn the distribution? An empirical study , 2017, ArXiv.

[50]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[51]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[52]  Andreas Hotho,et al.  Flow-based benchmark data sets for intrusion detection , 2017 .

[53]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[54]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[55]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[56]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[57]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[58]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[59]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[60]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[61]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[62]  Ratul Mahajan,et al.  Differentially-private network trace analysis , 2010, SIGCOMM '10.

[63]  Amin Vahdat,et al.  Swing: Realistic and Responsive Network Traffic Generation , 2009, IEEE/ACM Transactions on Networking.

[64]  Bruno Baynat,et al.  LiTGen, a Lightweight Traffic Generator: Application to P2P and Mail Wireless Traffic , 2007, PAM.

[65]  Michele C. Weigle,et al.  Tmix: a tool for generating realistic TCP application workloads in ns-2 , 2006, CCRV.

[66]  Tristan Henderson,et al.  CRAWDAD: a community resource for archiving wireless data at Dartmouth , 2005, CCRV.

[67]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[68]  Sebastian Zander,et al.  KUTE A high performance Kernel-based UDP traffic engine , 2005 .

[69]  Paul Barford,et al.  Harpoon: a flow-level traffic generator for router and network tests , 2004, SIGMETRICS '04/Performance '04.

[70]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[71]  Francisco Chinchilla Self-similarity in network traffic , 2002 .

[72]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.