DART: De-Anonymization of personal gazetteers through social trajectories

Abstract The interest in trajectory data has sensibly increased since the widespread of mobile devices. Simple clustering techniques allow the recognition of personal gazetteers, i.e., the set of main points of interest (also called stay points) of each user, together with the list of time instants of each visit. Due to their sensitiveness, personal gazetteers are usually anonymized, but their inherent unique patterns expose them to the risk of being de-anonymized. In particular, social trajectories (i.e., those obtained from social networks, which associate statuses and check-ins to spatial and temporal locations) can be leveraged by an adversary to de-anonymize personal gazetteers. In this paper, we propose DART as an innovative approach to effectively de-anonymize personal gazetteers through social trajectories, even in the absence of a temporal alignment between the two sources (i.e., they have been collected over different periods). DART relies on a big data implementation, guaranteeing the scalability to large volumes of data. We evaluate our approach on two real-world datasets and we compare it with recent state-of-the-art algorithms to verify its effectiveness.

[1]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[2]  Thad Starner,et al.  Learning Significant Locations and Predicting User Movement with GPS , 2002, Proceedings. Sixth International Symposium on Wearable Computers,.

[3]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[4]  Thomas Triplet,et al.  PatchWork, a scalable density-grid clustering algorithm , 2016, SAC.

[5]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[6]  Mirco Musolesi,et al.  Privacy and the City: User Identification and Location Semantics in Location-Based Social Networks , 2015, ICWSM.

[7]  Lionel Brunie,et al.  The Long Road to Computational Location Privacy: A Survey , 2019, IEEE Communications Surveys & Tutorials.

[8]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[9]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[11]  Tengfei Zhang,et al.  Improved rough k-means clustering algorithm based on weighted distance measure with Gaussian function , 2017, Int. J. Comput. Math..

[12]  Chris Schmandt,et al.  Location-Aware Information Delivery with ComMotion , 2000, HUC.

[13]  Reza Shokri,et al.  Evaluating the Privacy Risk of Location-Based Services , 2011, Financial Cryptography.

[14]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[15]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[16]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[17]  George Danezis,et al.  GENERAL TERMS , 2003 .

[18]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[21]  Xiaofang Zhou,et al.  Moving Object Linking Based on Historical Trace , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[22]  Nikhil Sharma,et al.  Quantifying Privacy Loss of Human Mobility Graph Topology , 2018, Proc. Priv. Enhancing Technol..

[23]  Jing Xiao,et al.  User Identity Linkage by Latent User Space Modelling , 2016, KDD.

[24]  Chandrakant Naikodi,et al.  Design of Big Data Privacy Framework—A Balancing Act , 2020 .

[25]  Franco Zambonelli,et al.  Re-identification and information fusion between anonymized CDR and social network data , 2015, Journal of Ambient Intelligence and Humanized Computing.

[26]  João P. Vilela,et al.  Privacy-Preserving Data Mining: Methods, Metrics, and Applications , 2017, IEEE Access.

[27]  Mirco Musolesi,et al.  It's the way you check-in: identifying users in location-based social networks , 2014, COSN '14.

[28]  Shashi Shekhar,et al.  Discovering personal gazetteers: an interactive clustering approach , 2004, GIS '04.

[29]  Kevin Chen-Chuan Chang,et al.  Mobile user verification/identification using statistical mobility profile , 2015, 2015 International Conference on Big Data and Smart Computing (BIGCOMP).

[30]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[31]  Imad Aad,et al.  From big smartphone data to worldwide research: The Mobile Data Challenge , 2013, Pervasive Mob. Comput..

[32]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[33]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[34]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2013, TNET.

[35]  Stéphane Bressan,et al.  Publishing trajectories with differential privacy guarantees , 2013, SSDBM.

[36]  Sébastien Gambs,et al.  De-anonymization attack on geolocated data , 2014, J. Comput. Syst. Sci..

[37]  Mirco Musolesi,et al.  Spatio-temporal techniques for user identification by means of GPS mobility data , 2015, EPJ Data Science.

[38]  Panos Kalnis,et al.  Providing K-Anonymity in location based services , 2010, SKDD.

[39]  Gang Wang,et al.  De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice , 2018, NDSS.