Quantifying Privacy Loss of Human Mobility Graph Topology

Abstract Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Mirco Musolesi,et al.  Privacy and the City: User Identification and Location Semantics in Location-Based Social Networks , 2015, ICWSM.

[3]  Hans-Peter Kriegel,et al.  Graph Kernels For Disease Outcome Prediction From Protein-Protein Interaction Networks , 2006, Pacific Symposium on Biocomputing.

[4]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[5]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Berker Agir,et al.  On the Privacy Implications of Location Semantics , 2016, Proc. Priv. Enhancing Technol..

[7]  Ingo Scholtes,et al.  When is a Network a Network?: Multi-Order Graphical Model Selection in Pathways and Temporal Networks , 2017, KDD.

[8]  Carmela Troncoso,et al.  Unraveling an old cloak: k-anonymity for location privacy , 2010, WPES '10.

[9]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[10]  Emiliano De Cristofaro,et al.  What Does The Crowd Say About You? Evaluating Aggregation-based Location Privacy , 2017, Proc. Priv. Enhancing Technol..

[11]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[12]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[13]  Alexander Markowetz,et al.  Differentiating smartphone users by app usage , 2016, UbiComp.

[14]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[15]  Morse Steven,et al.  Persistent cascades: Measuring fundamental communication structure in social networks , 2016 .

[16]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[17]  Gaetano Borriello,et al.  Extracting places from traces of locations , 2004, MOCO.

[18]  Marco Gruteser,et al.  USENIX Association , 1992 .

[19]  Ambuj K. Singh,et al.  Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  George Danezis,et al.  An Automated Social Graph De-anonymization Technique , 2014, WPES.

[22]  Martín Abadi,et al.  Host Fingerprinting and Tracking on the Web: Privacy and Security Implications , 2012, NDSS.

[23]  Imad Aad,et al.  The Mobile Data Challenge: Big Data for Mobile Computing Research , 2012 .

[24]  Alastair R. Beresford,et al.  Device Analyzer: Understanding Smartphone Usage , 2013, MobiQuitous.

[25]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[26]  Kevin Chen-Chuan Chang,et al.  Mobile user verification/identification using statistical mobility profile , 2015, 2015 International Conference on Big Data and Smart Computing (BIGCOMP).

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Nitesh V. Chawla,et al.  Representing higher-order dependencies in networks , 2015, Science Advances.

[29]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[31]  Siddharth Srivastava,et al.  Anonymizing Social Networks , 2007 .

[32]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[33]  George Danezis,et al.  GENERAL TERMS , 2003 .

[34]  Xiaoming Fu,et al.  Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data , 2017, WWW.

[35]  Claude Castelluccia,et al.  On the uniqueness of Web browsing history patterns , 2014, Ann. des Télécommunications.

[36]  Sébastien Gambs,et al.  De-anonymization Attack on Geolocated Data , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[37]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[38]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010, Stat. Anal. Data Min..

[39]  A. Pfitzmann,et al.  A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management , 2010 .

[40]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[41]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[42]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[43]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[44]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.