From Fingerprint to Footprint: Revealing Physical World Privacy Leakage by Cyberspace Cookie Logs

It is well-known that online services resort to various cookies to track users through users' online service identifiers (IDs) - in other words, when users access online services, various "fingerprints" are left behind in the cyberspace. As they roam around in the physical world while accessing online services via mobile devices, users also leave a series of "footprints" -- i.e., hints about their physical locations - in the physical world. This poses a potent new threat to user privacy: one can potentially correlate the "fingerprints" left by the users in the cyberspace with "footprints" left in the physical world to infer and reveal leakage of user physical world privacy, such as frequent user locations or mobility trajectories in the physical world - we refer to this problem as user physical world privacy leakage via user cyberspace privacy leakage. In this paper we address the following fundamental question: what kind - and how much - of user physical world privacy might be leaked if we could get hold of such diverse network datasets even without any physical location information. In order to conduct an in-depth investigation of these questions, we utilize the network data collected via a DPI system at the routers within one of the largest Internet operator in Shanghai, China over a duration of one month. We decompose the fundamental question into the three problems: i) linkage of various online user IDs belonging to the same person via mobility pattern mining; ii) physical location classification via aggregate user mobility patterns over time; and iii) tracking user physical mobility. By developing novel and effective methods for solving each of these problems, we demonstrate that the question of user physical world privacy leakage via user cyberspace privacy leakage is not hypothetical, but indeed poses a real potent threat to user privacy.

[1]  Xing Xie,et al.  Web resource geographic location classification and detection , 2005, WWW '05.

[2]  Stéphane Bressan,et al.  Not So Unique in the Crowd: a Simple and Effective Algorithm for Anonymizing Location Data , 2014, PIR@SIGIR.

[3]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[4]  Josep Domingo-Ferrer,et al.  Microaggregation- and permutation-based anonymization of movement data , 2012, Inf. Sci..

[5]  Balachander Krishnamurthy,et al.  On the leakage of personally identifiable information via online social networks , 2009, CCRV.

[6]  Benjamin C. M. Fung,et al.  Walking in the crowd: anonymizing trajectory data for pattern analysis , 2009, CIKM.

[7]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[8]  KorulaNitish,et al.  An efficient reconciliation algorithm for social networks , 2014, VLDB 2014.

[9]  Huan Liu,et al.  Modeling temporal effects of human mobile behavior on location-based social networks , 2013, CIKM.

[10]  Bobby Bhattacharjee,et al.  Persona: an online social network with user-defined privacy , 2009, SIGCOMM '09.

[11]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[13]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[14]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[15]  Kristen LeFevre,et al.  Privacy wizards for social networking sites , 2010, WWW '10.

[16]  Mirco Musolesi,et al.  It's the way you check-in: identifying users in location-based social networks , 2014, COSN '14.

[17]  Aniket Kittur,et al.  Bridging the gap between physical location and online social networks , 2010, UbiComp.

[18]  Stéphane Bressan,et al.  Discretionary social network data revelation with a user-centric utility guarantee , 2012, CIKM.

[19]  Marco Fiore,et al.  Hiding mobile traffic fingerprints with GLOVE , 2015, CoNEXT.

[20]  Aleksandar Kuzmanovic,et al.  Measuring serendipity: connecting people, locations and interests in a mobile 3G network , 2009, IMC '09.

[21]  Jahna Otterbacher,et al.  Inferring gender of movie reviewers: exploiting writing style, content and metadata , 2010, CIKM.

[22]  Tetsuji Satoh,et al.  Protection of Location Privacy using Dummies for Location-based Services , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[23]  Balachander Krishnamurthy,et al.  WWW 2009 MADRID! Track: Security and Privacy / Session: Web Privacy Privacy Diffusion on the Web: A Longitudinal Perspective , 2022 .

[24]  Matthias Grossglauser,et al.  Growing a Graph Matching from a Handful of Seeds , 2015, Proc. VLDB Endow..

[25]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[26]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[27]  Krishna P. Gummadi,et al.  Analyzing facebook privacy settings: user expectations vs. reality , 2011, IMC '11.

[28]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[31]  Balachander Krishnamurthy,et al.  Generating a privacy footprint on the internet , 2006, IMC '06.

[32]  Chaoming Song,et al.  Modelling the scaling properties of human mobility , 2010, 1010.0436.

[33]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[34]  Aleksandar Kuzmanovic,et al.  Mosaic: quantifying privacy leakage in mobile networks , 2013, SIGCOMM.

[35]  Walid Dabbous,et al.  I know where you are and what you are sharing: exploiting P2P communications to invade users' privacy , 2011, IMC '11.

[36]  Xiaoli Li,et al.  Where you Instagram?: Associating Your Instagram Photos with Points of Interest , 2015, CIKM.

[37]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[38]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[39]  Balachander Krishnamurthy,et al.  Privacy leakage vs . Protection measures : the growing disconnect , 2011 .

[40]  Hui Zang,et al.  Are call detail records biased for sampling human mobility? , 2012, MOCO.

[41]  Arnaud Legout,et al.  ReCon: Revealing and Controlling Privacy Leaks in Mobile Network Traffic , 2015, ArXiv.

[42]  Arnaud Legout,et al.  ReCon: Revealing and Controlling PII Leaks in Mobile Network Traffic , 2015, MobiSys.

[43]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.