Cost-Effective IP Trace Publishing Using Data Sketch

IP Traces are sets of IP packets (or packet headers) captured at the measuring point. Their publishing, which is most challenged by massive size concern, is crucial for network research. In this paper, we propose a new scheme for IP Trace publishing which offers much smaller transportation quantity than the traditional methods. Based on Cisco's Net flow technique, the data provider first summarizes an original IP Trace to a sketch. During the summarizing process, extra statistics of certain fields in the original IP Trace are obtained. The sketch and the statistics, which are much smaller in size, are then published instead of the original IP Trace. Based on the Monte Carlo simulation technique, the data down loader can generate a synthetic IP Trace from the sketch and the statistics which preserves most of the statistical properties of the original IP Trace. According to our experiments, the transportation quantity of our scheme is only 3% of that in the traditional methods and meanwhile privacy is better protected. In the end, the utility of the synthetic IP Trace and that of the original IP Trace are compared using two network performance metrics (throughput and RTT). The result shows that this scheme is feasible.

[1]  Paul Barford,et al.  Self-configuring network traffic generation , 2004, IMC '04.

[2]  Jelena Mirkovic,et al.  Privacy-safe network trace sharing via secure queries , 2008, NDA '08.

[3]  Spyros Antonatos,et al.  On the Privacy Risks of Publishing Anonymized IP Network Traces , 2006, Communications and Multimedia Security.

[4]  Martin May,et al.  The risk-utility tradeoff for IP address truncation , 2008, NDA '08.

[5]  Charles V. Wright,et al.  Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces , 2007, NDSS.

[6]  Lei Zhen-ming A Passive RTT Estimate Algorithm for TCP , 2004 .

[7]  Vern Paxson,et al.  Issues and etiquette concerning use of shared measurement data , 2007, IMC '07.

[8]  Amin Vahdat,et al.  Realistic and responsive network traffic generation , 2006, SIGCOMM.

[9]  Abhinav Parate,et al.  A framework for safely publishing communication traces , 2009, CIKM.

[10]  Michael K. Reiter,et al.  The Challenges of Effectively Anonymizing Network Data , 2009, 2009 Cybersecurity Applications & Technology Conference for Homeland Security.

[11]  Martin F. Arlitt,et al.  SC2D: an alternative to trace anonymization , 2006, MineNet '06.

[12]  kc claffy,et al.  The RTT distribution of TCP flows on the Internet and its impact on TCP based flow control , 2004 .

[13]  Amin Vahdat,et al.  Realistic and responsive network traffic generation , 2006, SIGCOMM 2006.

[14]  Jason Lee,et al.  The devil and packet trace anonymization , 2006, CCRV.

[15]  Hao Jiang,et al.  Passive estimation of TCP round-trip times , 2002, CCRV.