Tracing Cross Border Web Tracking

A tracking flow is a flow between an end user and a Web tracking service. We develop an extensive measurement methodology for quantifying at scale the amount of tracking flows that cross data protection borders, be it national or international, such as the EU28 border within which the General Data Protection Regulation (GDPR) applies. Our methodology uses a browser extension to fully render advertising and tracking code, various lists and heuristics to extract well known trackers, passive DNS replication to get all the IP ranges of trackers, and state-of-the art geolocation. We employ our methodology on a dataset from 350 real users of the browser extension over a period of more than four months, and then generalize our results by analyzing billions of web tracking flows from more than 60 million broadband and mobile users from 4 large European ISPs. We show that the majority of tracking flows cross national borders in Europe but, unlike popular belief, are pretty well confined within the larger GDPR jurisdiction. Simple DNS redirection and PoP mirroring can increase national confinement while sealing almost all tracking flows within Europe. Last, we show that cross boarder tracking is prevalent even in sensitive and hence protected data categories and groups including health, sexual orientation, minors, and others.

[1]  John C. Mitchell,et al.  Third-Party Web Tracking: Policy and Technology , 2012, 2012 IEEE Symposium on Security and Privacy.

[2]  Claude Castelluccia,et al.  MyAdChoices: Bringing Transparency and Control to Online Advertising , 2016, ACM Trans. Web.

[3]  Nick Feamster,et al.  Geographic locality of IP prefixes , 2005, IMC '05.

[4]  Narseo Vallina-Rodriguez,et al.  "Is Our Children's Apps Learning?" Automatically Detecting COPPA Violations , 2017 .

[5]  Cynthia J. Larose,et al.  Children's Online Privacy Protection Act , 2015 .

[6]  Steve Uhlig,et al.  Assessing the Geographic Resolution of Exhaustive Tabulation for Geolocating Internet Hosts , 2008, PAM.

[7]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[8]  Anja Feldmann,et al.  Annoyed Users: Ads and Ad-Block Usage in the Wild , 2015, Internet Measurement Conference.

[9]  Evangelos P. Markatos,et al.  Exclusive: How the (synced) Cookie Monster breached my encrypted VPN session , 2018, EuroSec@EuroSys.

[10]  Ramesh Govindan,et al.  Mapping the expansion of Google's serving infrastructure , 2013, Internet Measurement Conference.

[11]  Srdjan Capkun,et al.  Quantifying Web Adblocker Privacy , 2017, ESORICS.

[12]  Patrick D. McDaniel,et al.  Measuring the Impact and Perception of Acceptable Advertisements , 2015, Internet Measurement Conference.

[13]  Jun Zhao,et al.  Third Party Tracking in the Mobile Ecosystem , 2018, WebSci.

[14]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[15]  John P. Rula,et al.  Content delivery and the natural evolution of DNS: remote dns trends, performance issues and alternative solutions , 2012, Internet Measurement Conference.

[16]  Steve Uhlig,et al.  The Rise of Panopticons: Examining Region-Specific Third-Party Web Tracking , 2014, TMA.

[17]  Jun Wang,et al.  Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting , 2016, Found. Trends Inf. Retr..

[18]  Nick Nikiforakis,et al.  Are You Sure You Want to Contact Us? Quantifying the Leakage of PII via Website Contact Forms , 2016, Proc. Priv. Enhancing Technol..

[19]  Sergey Gorinsky,et al.  Ads versus regular contents: Dissecting the web hosting ecosystem , 2017, 2017 IFIP Networking Conference (IFIP Networking) and Workshops.

[20]  Anja Feldmann,et al.  Exploring EDNS-client-subnet adopters in your free time , 2013, Internet Measurement Conference.

[21]  Steve Uhlig,et al.  Tracking Personal Identifiers Across the Web , 2016, PAM.

[22]  Christo Wilson,et al.  How Tracking Companies Circumvented Ad Blockers Using WebSockets , 2018, Internet Measurement Conference.

[23]  Pablo Rodriguez,et al.  If you are not paying for it, you are the product: how much do advertisers pay to reach you? , 2017, Internet Measurement Conference.

[24]  Narseo Vallina-Rodriguez,et al.  Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem , 2018, NDSS.

[25]  Han Zhang,et al.  A look at router geolocation in public and commercial databases , 2017, Internet Measurement Conference.

[26]  R. Shay,et al.  Measuring the Effectiveness of Privacy Tools for Limiting Behavioral Advertising , 2012 .

[27]  Fan Yang,et al.  The QUIC Transport Protocol: Design and Internet-Scale Deployment , 2017, SIGCOMM.

[28]  Jan Rüth,et al.  A First Look at QUIC in the Wild , 2018, PAM.

[29]  Narseo Vallina-Rodriguez,et al.  Breaking for commercials: characterizing mobile advertising , 2012, Internet Measurement Conference.

[30]  Florian Weimer,et al.  Passive DNS Replication , 2005 .

[31]  J. Murphy The General Data Protection Regulation (GDPR) , 2018, Irish medical journal.

[32]  Frank Piessens,et al.  FPDetective: dusting the web for fingerprinters , 2013, CCS.

[33]  Paul Ferguson,et al.  Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing , 1998, RFC.

[34]  Tadayoshi Kohno,et al.  Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016 , 2016, USENIX Security Symposium.

[35]  Steve Uhlig,et al.  IP geolocation databases: unreliable? , 2011, CCRV.

[36]  Rebecca Balebako,et al.  Variations in Tracking in Relation to Geographic Location , 2015, ArXiv.

[37]  David Wetherall,et al.  Towards IP geolocation using delay and topology measurements , 2006, IMC '06.

[38]  Christo Wilson,et al.  Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services , 2016, Internet Measurement Conference.