Alternative to third-party cookies: investigating persistent PII leakage-based web tracking

Many popular websites give users the ability to sign up for their services, which requires personally identifiable information (PII). However, these websites embed third-party tracking and advertising resources, and as a consequence, the authentication flow can intentionally or unintentionally leak PII to these services. Since a user can be identified with PII, trackers can use it for tracking purposes, leading to further privacy leaks when cross-site, cross-browser, and cross-device tracking occur. In this paper, we document a persistent web tracking mechanism that relies on manipulating PII leakage after a user completes the sign-up and sign-in flow (authentication flows) on first-party sites. To the best of our knowledge, this is the first in-depth analysis of leaked PII in the authentication flows. By investigating the authentication flows for 307 popular shopping sites from the Tranco top 10,000 sites, we first discover that 42.3% of sites leak the PII to third-party services. Then, we present a previously unknown persistent web tracking technique based on PII leakage that enables tracking providers to generate and store a unique persistent identifier for a user with his/her browsing history on their tracking servers. By analyzing 130 first-party senders along with 100 third-party receiver domains, we show that PII leakage is a potentially important vector for online tracking for at least 20 providers. In addition, we check the privacy policy of the 130 first-party senders and observe that they are not clear about PII exchange with third parties. Finally, to provide a wider picture of current in-browser privacy protection techniques, we evaluate the effect of browsers and well-known blocklists against PII leakage. We point out that browsers are unable to deal with PII leakage except for Brave with its privacy-improving features, whereas blocklists reduce the number of leaked PII resources but do not fix this problem in general.

[1]  Arnaud Legout,et al.  Missed by Filter Lists: Detecting Unknown Third-Party Trackers with Invisible Pixels , 2020, Proc. Priv. Enhancing Technol..

[2]  Nili Steinfeld,et al.  "I agree to the terms and conditions": (How) do users read privacy policies online? An eye-tracking experiment , 2016, Comput. Hum. Behav..

[3]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[4]  Arnaud Legout,et al.  ReCon: Revealing and Controlling PII Leaks in Mobile Network Traffic , 2015, MobiSys.

[5]  Информатика Public Suffix List , 2010 .

[6]  Steve Uhlig,et al.  Tracking Personal Identifiers Across the Web , 2016, PAM.

[7]  Ilana Segall,et al.  The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing , 2020, WWW.

[8]  Valery Dudykevych,et al.  Detecting third-party user trackers with cookie files , 2016, 2016 Third International Scientific-Practical Conference Problems of Infocommunications Science and Technology (PIC S&T).

[9]  Wouter Joosen,et al.  The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion , 2021, Proc. Priv. Enhancing Technol..

[10]  Arvind Narayanan,et al.  The Web Never Forgets: Persistent Tracking Mechanisms in the Wild , 2014, CCS.

[11]  Xavier Blanc,et al.  FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers , 2020, Proceedings 2020 Workshop on Measurements, Attacks, and Defenses for the Web.

[12]  Nick Nikiforakis,et al.  Extended Tracking Powers: Measuring the Privacy Diffusion Enabled by Browser Extensions , 2017, WWW.

[13]  Nick Nikiforakis,et al.  Are You Sure You Want to Contact Us? Quantifying the Leakage of PII via Website Contact Forms , 2016, Proc. Priv. Enhancing Technol..

[14]  Norbert Pohlmann,et al.  Measuring the Impact of the GDPR on Data Sharing in Ad Networks , 2020, AsiaCCS.

[15]  Balachander Krishnamurthy,et al.  WWW 2009 MADRID! Track: Security and Privacy / Session: Web Privacy Privacy Diffusion on the Web: A Longitudinal Perspective , 2022 .

[16]  Kensuke Fukuda,et al.  CNAME Cloaking-Based Tracking on the Web: Characterization, Detection, and Protection , 2021, IEEE Transactions on Network and Service Management.

[17]  Elie Bursztein,et al.  Cloak of Visibility: Detecting When Machines Browse a Different Web , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[18]  Hugo L. Jonker,et al.  Fingerprint Surface-Based Detection of Web Bot Detectors , 2019, ESORICS.

[19]  Balachander Krishnamurthy,et al.  Privacy leakage vs . Protection measures : the growing disconnect , 2011 .

[20]  Frank Piessens,et al.  FPDetective: dusting the web for fingerprinters , 2013, CCS.

[21]  Sotiris Ioannidis,et al.  You Shall Not Register! Detecting Privacy Leaks Across Registration Forms , 2019, IOSec/MSTEC/FINSEC@ESORICS.

[22]  Arvind Narayanan,et al.  I never signed up for this! Privacy implications of email tracking , 2018, Proc. Priv. Enhancing Technol..

[23]  Wouter Joosen,et al.  Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation , 2018, NDSS.

[24]  Aaron Alva,et al.  Cross-Device Tracking: Measurement and Disclosures , 2017, Proc. Priv. Enhancing Technol..

[25]  Nataliia Bielova Web Tracking Technologies and Protection Mechanisms , 2017, CCS.

[26]  Evangelos P. Markatos,et al.  Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask , 2018, WWW.