De-anonymization attack on geolocated datasets

With the advent of GPS-equipped devices, a massive amount of location data is being collected, raising the issue of the privacy risks incurred by the individuals whose movements are recorded. In this work, we focus on a specific inference attack called the de-anonymization attack, by which an adversary tries to infer the identity of a particular individual behind a set of mobility traces. More specifically, we propose an implementation of this attack based on a mobility model called Mobility Markov Chain (MMC). A MMC is built out from the mobility traces observed during the training phase and is used to perform the attack during the testing phase. We design two distance metrics quantifying the closeness between two MMCs and combine these distances to build de-anonymizers that can re-identify users in an anonymized geolocated dataset. Experiments conducted on real datasets demonstrate that the attack is both accurate and resilient to sanitization mechanisms such as downsampling. Keywords-Privacy, geolocation, inference attack, deanonymization.

[1]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[2]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  Wei-Ying Ma,et al.  Understanding mobility based on GPS data , 2008, UbiComp.

[5]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[6]  Matthias Grossglauser,et al.  CRAWDAD dataset epfl/mobility (v.2009-02-24) , 2009 .

[7]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[8]  Sushil Jajodia,et al.  Protecting Privacy Against Location-Based Personal Identification , 2005, Secure Data Management.

[9]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[10]  Thad Starner,et al.  Learning Significant Locations and Predicting User Movement with GPS , 2002, Proceedings. Sixth International Symposium on Wearable Computers,.

[11]  George Danezis,et al.  GENERAL TERMS , 2003 .

[12]  John Krumm,et al.  Inference Attacks on Location Tracks , 2007, Pervasive.

[13]  Xing Xie,et al.  GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory , 2010, IEEE Data Eng. Bull..

[14]  Xing Xie,et al.  Finding similar users using category-based location history , 2010, GIS '10.

[15]  B. Nuseibeh,et al.  Know What You Did Last Summer : risks of location data leakage in mobile and social computing , 2009 .

[16]  Reza Shokri,et al.  Evaluating the Privacy Risk of Location-Based Services , 2011, Financial Cryptography.

[17]  Stefano Spaccapietra,et al.  SeMiTri: a framework for semantic annotation of heterogeneous trajectories , 2011, EDBT/ICDT '11.

[18]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[19]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[20]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[21]  Sébastien Gambs,et al.  Show me how you move and I will tell you who you are , 2010, SPRINGL '10.

[22]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.