The cookie recipe: Untangling the use of cookies in the wild

Users online are commonly tracked using HTTP cookies when browsing on the web. To protect their privacy, users tend to use simple tools to block the activity of HTTP cookies. However, the “block all” design of tools breaks critical web services or severely limits the online advertising ecosystem. Therefore, to ease this tension, a more nuanced strategy that discerns better the intended functionality of the HTTP cookies users encounter is required. We present the first large-scale study of the use of HTTP cookies in the wild using network traces containing more than 5.6 billion HTTP requests from real users for a period of two and a half months. We first present a statistical analysis of how cookies are used. We then analyze the structure of cookies and observe that; HTTP cookies are significantly more sophisticated than the name=value defined by the standard and assumed by researchers and developers. Based on our findings we present an algorithm that is able to extract the information included in 86% of the cookies in our dataset with an accuracy of 91.7%. Finally, we discuss the implications of our findings and provide solutions that can be used to improve the most promising privacy preserving tools.

[1]  Balachander Krishnamurthy,et al.  Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning , 2016, Proc. Priv. Enhancing Technol..

[2]  Claude Castelluccia,et al.  Selling Off Privacy at Auction , 2014, NDSS 2014.

[3]  David M. Kristol,et al.  HTTP State Management Mechanism , 1997, RFC.

[4]  Michalis Faloutsos,et al.  TrackAdvisor: Taking Back Browsing Privacy from Third-Party Trackers , 2015, PAM.

[5]  Marco Mellia,et al.  The Online Tracking Horde: A View from Passive Measurements , 2015, TMA.

[6]  Bill Fitzgerald,et al.  Tracking the Trackers , 2016 .

[7]  David Wetherall,et al.  Detecting and Defending Against Third-Party Tracking on the Web , 2012, NSDI.

[8]  Balachander Krishnamurthy,et al.  Privacy leakage vs . Protection measures : the growing disconnect , 2011 .

[9]  Martín Abadi,et al.  Host Fingerprinting and Tracking on the Web: Privacy and Security Implications , 2012, NDSS.

[10]  Marco Mellia,et al.  Unsupervised Detection of Web Trackers , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[11]  Vijay Erramilli,et al.  I always feel like somebody's watching me: measuring online behavioural advertising , 2014, CoNEXT.

[12]  Peter Eckersley,et al.  How Unique Is Your Web Browser? , 2010, Privacy Enhancing Technologies.

[13]  Arvind Narayanan,et al.  The Web Never Forgets: Persistent Tracking Mechanisms in the Wild , 2014, CCS.

[14]  Steve Uhlig,et al.  The Rise of Panopticons: Examining Region-Specific Third-Party Web Tracking , 2014, TMA.

[15]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[16]  Martin Stopczynski,et al.  Reducing User Tracking through Automatic Web Site State Isolations , 2014, ISC.

[17]  Angelos D. Keromytis,et al.  The Cracked Cookie Jar: HTTP Cookie Hijacking and the Exposure of Private Information , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[18]  Nikolaos Laoutaris,et al.  Web Identity Translator: Behavioral Advertising and Identity Privacy with WIT , 2015, HotNets.

[19]  Steve Uhlig,et al.  Anatomy of the Third-Party Web Tracking Ecosystem , 2014, ArXiv.

[20]  Edward W. Felten,et al.  Cookies That Give You Away: The Surveillance Implications of Web Tracking , 2015, WWW.

[21]  Qiang Ma,et al.  Adscape: harvesting and analyzing online display ads , 2014, WWW.

[22]  Sergei Vassilvitskii,et al.  To Match or Not to Match , 2015, ACM Trans. Economics and Comput..