Taming the Devil: Techniques for Evaluating Anonymized Network Data

Anonymization plays a key role in enabling the public release of network datasets, and yet there are few, if any, techniques for evaluating the efficacy of network data anonymization techniques with respect to the privacy they afford. In fact, recent work suggests that many state-of-the-art anonymization techniques may leak more information than first thought. In this paper, we propose techniques for evaluating the anonymity of network data. Specifically, we simulate the behavior of an adversary whose goal is to deanonymize objects, such as hosts or web pages, within the network data. By doing so, we are able to quantify the anonymity of the data using information theoretic metrics, objectively compare the efficacy of anonymization techniques, and examine the impact of selective deanonymization on the anonymity of the data. Moreover, we provide several concrete applications of our approach on real network data in the hope of underscoring its usefulness to data

[1]  Mary K. Vernon,et al.  Mapping Internet Sensors with Probe Response Attacks , 2005, USENIX Security Symposium.

[2]  André Årnes,et al.  Circumventing IP-address pseudonymization , 2005, Communications and Computer Networks.

[3]  Elisa Bertino,et al.  Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk , 2009, Trans. Data Priv..

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Yoichi Shinoda,et al.  Vulnerabilities of Passive Internet Threat Monitors , 2005, USENIX Security Symposium.

[6]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[7]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[8]  Jason Lee,et al.  The devil and packet trace anonymization , 2006, CCRV.

[9]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[10]  Tristan Henderson,et al.  CRAWDAD: a community resource for archiving wireless data at Dartmouth , 2005, CCRV.

[11]  André Årnes,et al.  Non-expanding Transaction Specific Pseudonymization for IP Traffic Monitoring , 2005, CANS.

[12]  John G. van Bosse,et al.  Wiley Series in Telecommunications and Signal Processing , 2006 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Spyros Antonatos,et al.  On the Privacy Risks of Publishing Anonymized IP Network Traces , 2006, Communications and Multimedia Security.

[15]  David Moore,et al.  The internet measurement data catalog , 2005, CCRV.

[16]  Tristan Henderson,et al.  CRAWDAD: A Community Resource for Archiving Wireless Data at Dartmouth , 2005, IEEE Pervasive Comput..

[17]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[19]  Bart Preneel,et al.  Towards Measuring Anonymity , 2002, Privacy Enhancing Technologies.

[20]  C. Dwork,et al.  On the Utility of Privacy-Preserving Histograms , 2004 .

[21]  Charles V. Wright,et al.  On Web Browsing Privacy in Anonymized NetFlows , 2007, USENIX Security Symposium.

[22]  Charles V. Wright,et al.  Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Network Traces , 2007, NDSS.

[23]  Donald F. Towsley,et al.  Analyzing Privacy in Enterprise Packet Trace Anonymization , 2008, NDSS.

[24]  André Årnes,et al.  Anonymization of IP Traffic Monitoring Data: Attacks on Two Prefix-Preserving Anonymization Schemes and Some Proposed Remedies , 2005, Privacy Enhancing Technologies.

[25]  George Danezis,et al.  Towards an Information Theoretic Metric for Anonymity , 2002, Privacy Enhancing Technologies.

[26]  Vern Paxson,et al.  Strategies for sound internet measurement , 2004, IMC '04.