De-anonymizing Web Browsing Data with Social Networks

Can online trackers and network adversaries de-anonymize web browsing data readily available to them? We show---theoretically, via simulation, and through experiments on real user data---that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one's feed is unique. Assuming users visit links in their feed with higher probability than a random user, browsing histories contain tell-tale marks of identity. We formalize this intuition by specifying a model of web browsing behavior and then deriving the maximum likelihood estimate of a user's social profile. We evaluate this strategy on simulated browsing histories, and show that given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50% of the time.To gauge the real-world effectiveness of this approach, we recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them. We further show that several online trackers are embedded on sufficiently many websites to carry out this attack with high accuracy. Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. Finally, since our attack attempts to find the correct Twitter profile out of over 300 million candidates, it is---to our knowledge---the largest-scale demonstrated de-anonymization to date.

[1]  John C. Mitchell,et al.  Third-Party Web Tracking: Policy and Technology , 2012, 2012 IEEE Symposium on Security and Privacy.

[2]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[3]  Edward W. Felten,et al.  Cookies That Give You Away: The Surveillance Implications of Web Tracking , 2015, WWW.

[4]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[5]  Catherine Tucker,et al.  Government Surveillance and Internet Search Behavior , 2017 .

[6]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[7]  Frank Piessens,et al.  FPDetective: dusting the web for fingerprinters , 2013, CCS.

[8]  David Wetherall,et al.  Detecting and Defending Against Third-Party Tracking on the Web , 2012, NSDI.

[9]  Claude Castelluccia,et al.  On the uniqueness of Web browsing history patterns , 2014, Ann. des Télécommunications.

[10]  Hovav Shacham,et al.  Pixel Perfect : Fingerprinting Canvas in HTML 5 , 2012 .

[11]  Wouter Joosen,et al.  Cookieless Monster: Exploring the Ecosystem of Web-Based Device Fingerprinting , 2013, 2013 IEEE Symposium on Security and Privacy.

[12]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[13]  Christopher Krügel,et al.  A Practical Attack to De-anonymize Social Network Users , 2010, 2010 IEEE Symposium on Security and Privacy.

[14]  Nicolas Christin,et al.  Dissecting one click frauds , 2010, CCS '10.

[15]  Chris Jay Hoofnagle,et al.  Flash Cookies and Privacy , 2009, AAAI Spring Symposium: Intelligent Information Privacy Management.

[16]  Arvind Narayanan,et al.  The Web Never Forgets: Persistent Tracking Mechanisms in the Wild , 2014, CCS.

[17]  Monica Chew Contextual Identity : Freedom to be All Your Selves , 2013 .

[18]  Serge Egelman,et al.  Fingerprinting Web Users Through Font Metrics , 2015, Financial Cryptography.

[19]  Timothy Libert,et al.  Exposing the Hidden Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites , 2015, ArXiv.

[20]  David J. Crandall,et al.  De-Anonymizing Users Across Heterogeneous Social Computing Platforms , 2013, ICWSM.

[21]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[22]  George Danezis,et al.  An Automated Social Graph De-anonymization Technique , 2014, WPES.

[23]  Peter Eckersley,et al.  How Unique Is Your Web Browser? , 2010, Privacy Enhancing Technologies.

[24]  Augustin Chaintreau,et al.  "I knew they clicked when i saw them with their friends": identifying your silent web visitors on social media , 2014, COSN '14.

[25]  Sharad Goel,et al.  The Effect of Recommendations on Network Structure , 2016, WWW.

[26]  Georgios Zervas,et al.  Understanding Emerging Threats to Online Advertising , 2016, EC.

[27]  Chris Jay Hoofnagle,et al.  Flash Cookies and Privacy II: Now with HTML5 and ETag Respawning , 2011 .

[28]  J. Penney Chilling Effects: Online Surveillance and Wikipedia Use , 2016 .

[29]  Tadayoshi Kohno,et al.  Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016 , 2016, USENIX Security Symposium.

[30]  Nick Mathewson,et al.  Practical Traffic Analysis: Extending and Resisting Statistical Disclosure , 2004, Privacy Enhancing Technologies.

[31]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[32]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[33]  Foster J. Provost,et al.  The myth of the double-blind review?: author identification using only citations , 2003, SKDD.

[34]  Balachander Krishnamurthy,et al.  Privacy leakage vs . Protection measures : the growing disconnect , 2011 .

[35]  Walter Rudametkin,et al.  Beauty and the Beast: Diverting Modern Web Browsers to Build Unique Browser Fingerprints , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[36]  Claude Castelluccia,et al.  The Leaking Battery - A Privacy Analysis of the HTML5 Battery Status API , 2015, DPM/QASA@ESORICS.

[37]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[38]  Balachander Krishnamurthy,et al.  On the leakage of personally identifiable information via online social networks , 2009, CCRV.