Statistical matching in the presence of anonymization and obfuscation: Non-asymptotic results in the discrete case

Many popular applications use traces of user data to tune services to their users but come with a significant risk to user privacy. In particular, even if user traces are anonymized, statistical matching of these traces to prior user behavior can be used to identify the user and their behavior. Because of this threat, there has been significant recent work exploring the theoretical foundations of this problem in the limit of a large number of users and/or observations, where the asymptotic nature of the approaches allows for clean analytical results. In this paper, we turn attention to exact performance analysis for a finite number of users and observations. We consider the case where a user is distributed over a discrete set of states according to a probability distribution drawn at random, which we assume is known to the adversary based on his/her analysis of past user behavior. The finite-length traces are then anonymized and obfuscated at a cost in user utility. We analyze the ability of the adversary to correctly identify user data samples as a function of the rate of anonymization and degree of obfuscation, and we arrive at complicated yet readily numerically evaluated expressions. These results allow us to investigate interesting questions left open by the asymptotic nature of previous work.

[1]  Qian Zhang,et al.  Privacy-Preserving Collaborative Spectrum Sensing With Multiple Service Providers , 2015, IEEE Transactions on Wireless Communications.

[2]  Qinghua Li,et al.  Enhancing privacy through caching in location-based services , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[3]  Catuscia Palamidessi,et al.  Optimal Geo-Indistinguishable Mechanisms for Location Privacy , 2014, CCS.

[4]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[5]  S. Hyrynsalmi,et al.  Security in the Internet of Things through obfuscation and diversification , 2015, 2015 International Conference on Computing, Communication and Security (ICCCS).

[6]  Marco Gruteser,et al.  Protecting Location Privacy Through Path Confusion , 2005, First International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM'05).

[7]  Jayakrishnan Unnikrishnan,et al.  Asymptotically Optimal Matching of Multiple Sequences to Source Distributions and Training Sequences , 2014, IEEE Transactions on Information Theory.

[8]  Ghamri-Doudane Yacine,et al.  Anomaly-based intrusion detection system for ad hoc networks , 2016 .

[9]  Dennis Goeckel,et al.  Bayesian time series matching and privacy , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[10]  Lior Rokach,et al.  Entity Matching in Online Social Networks , 2013, 2013 International Conference on Social Computing.

[11]  Maxim Raya,et al.  Mix-Zones for Location Privacy in Vehicular Networks , 2007 .

[12]  Dennis Goeckel,et al.  Limits of location privacy under anonymization and obfuscation , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[13]  Marco Gruteser,et al.  USENIX Association , 1992 .

[14]  Hossein Pishro-Nik,et al.  Matching Anonymized and Obfuscated Time Series to Users’ Profiles , 2017, IEEE Transactions on Information Theory.

[15]  Hossein Pishro-Nik,et al.  Achieving Perfect Location Privacy in Wireless Devices Using Anonymization , 2016, IEEE Transactions on Information Forensics and Security.

[16]  Soma Bandyopadhyay,et al.  IoT-Privacy: To be private or not to be private , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[17]  Frank Kargl,et al.  A location privacy metric for V2X communication systems , 2009, 2009 IEEE Sarnoff Symposium.

[18]  Tetsuji Satoh,et al.  An anonymous communication technique using dummies for location-based services , 2005, ICPS '05. Proceedings. International Conference on Pervasive Services, 2005..

[19]  Moshe Ben-Akiva,et al.  Dynamic latent plan models , 2010 .

[20]  Dennis Goeckel,et al.  Fundamental limits of location privacy using anonymization , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[21]  Athanasios V. Vasilakos,et al.  The Quest for Privacy in the Internet of Things , 2016, IEEE Cloud Computing.

[22]  George Danezis,et al.  GENERAL TERMS , 2003 .

[23]  Frank Dürr,et al.  A classification of location privacy attacks and approaches , 2012, Personal and Ubiquitous Computing.

[24]  Dennis Goeckel,et al.  Identification of Wireless Devices of Users Who Actively Fake Their RF Fingerprints With Artificial Data Distortion , 2015, IEEE Transactions on Wireless Communications.

[25]  Qian Zhang,et al.  Toward long-term quality of protection in mobile networks: a context-aware perspective , 2015, IEEE Wireless Communications.

[26]  Hua Lu,et al.  PAD: privacy-area aware, dummy-based location privacy in mobile services , 2008, MobiDE '08.

[27]  Huirong Fu,et al.  Evaluating Location Privacy in Vehicular Communications and Applications , 2016, IEEE Transactions on Intelligent Transportation Systems.

[28]  Charisma Farheen Choudhury,et al.  Modeling driving decisions with latent plans , 2007 .

[29]  Carmela Troncoso,et al.  Protecting location privacy: optimal strategy against localization attacks , 2012, CCS.

[30]  Joel M. Cooper,et al.  Cost of Warning of Unseen Threats: Unintended Consequences of Connected Vehicle Alerts , 2015 .

[31]  George Danezis,et al.  Quantifying Location Privacy: The Case of Sporadic Location Exposure , 2011, PETS.

[32]  Niraj K. Jha,et al.  A Comprehensive Study of Security of Internet-of-Things , 2017, IEEE Transactions on Emerging Topics in Computing.

[33]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.