De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice

Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have underestimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.

[1]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[2]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[5]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[7]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Prateek Mittal,et al.  On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge , 2015, NDSS.

[10]  Wei Xiang,et al.  Big data-driven optimization for mobile networks toward 5G , 2016, IEEE Network.

[11]  E. Locard Traité de criminalistique , 1931 .

[12]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[14]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[15]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[16]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[17]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[18]  Mirco Musolesi,et al.  It's the way you check-in: identifying users in location-based social networks , 2014, COSN '14.

[19]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[20]  Gang Wang,et al.  On the validity of geosocial mobility traces , 2013, HotNets.

[21]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[22]  Jing Liu,et al.  Survey of Wireless Indoor Positioning Techniques and Systems , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[25]  Franco Zambonelli,et al.  Re-identification and information fusion between anonymized CDR and social network data , 2016, J. Ambient Intell. Humaniz. Comput..

[26]  Shouling Ji,et al.  Structural Data De-anonymization: Quantification, Practice, and Implications , 2014, CCS.

[27]  Carmela Troncoso,et al.  Back to the Drawing Board: Revisiting the Design of Optimal Location Privacy-preserving Mechanisms , 2017, CCS.

[28]  Jing Xiao,et al.  User Identity Linkage by Latent User Space Modelling , 2016, KDD.

[29]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[30]  Gang Wang,et al.  "Will Check-in for Badges": Understanding Bias and Misbehavior on Location-Based Social Networks , 2021, ICWSM.

[31]  Ahmad Rahmati,et al.  Users and Batteries: Interactions and Adaptive Energy Management in Mobile Systems , 2007, UbiComp.

[32]  Reza Zafarani,et al.  User Identity Linkage across Online Social Networks: A Review , 2017, SKDD.

[33]  George Danezis,et al.  GENERAL TERMS , 2003 .

[34]  Francesco Bonchi,et al.  Anonymization of moving objects databases by clustering and perturbation , 2010, Inf. Syst..

[35]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[36]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[37]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[38]  Claude Castelluccia,et al.  Study : Privacy Preserving Release of Spatio-temporal Density in Paris , 2014 .

[39]  Javier Parapar,et al.  Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems , 2016, CERI.

[40]  Marco Fiore,et al.  Hiding mobile traffic fingerprints with GLOVE , 2015, CoNEXT.