On the design and performance of prefix-preserving IP traffic trace anonymization

Even though real-world Internet traffic traces are crucial for network research, only a tiny percentage of traffic traces collected are made public. One major reason why traffic trace owners hesitate to make the traces publicly available is the concern that the confidential and private information may be inferred from the trace. In this paper we focus on the problem of anonymizing IP addresses in a trace. More specifically, we are interested in prefixpreserving anonymization in which the prefix relationship among IP addresses is preserved in the anonymized trace, making such a trace usable in situations where such prefix relationships are important. The goal of our work is two fold. First, we are interested in analyzing the security properties inherent in prefix-preserving IP address anonymization. Through the analysis of IP traffic traces, we investigate the effect of some types of attacks on the security of the prefix-preserving anonymization process. We also derive results for the optimum manner in which an attack should proceed which provides a bound on the performance of attacks in general. Second, we observe that an existing scheme used for prefix-preserving anonymization, TCPdpriv, has some drawbacks that limit its use in a large-scale, distributed setting. We develop an alternative cryptography-based, prefixpreserving anonymization technique to address these drawbacks while maintaining the same level of anonymity as TCPdpriv.