WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking

Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as ADGRAPH, are susceptible to adversarial evasions deployed in real-world. Second, we introduce WEBGRAPH, the first graph-based machine learning blocker that detects ads and trackers based on their action rather than their content. By building features around the actions that are fundamental to advertising and tracking – storing an identifier in the browser, or sharing an identifier with another tracker – WEBGRAPH performs nearly as well as prior approaches, but is significantly more robust to adversarial evasions. In particular, we show that WEBGRAPH achieves comparable accuracy to ADGRAPH, while significantly decreasing the success rate of an adversary from near-perfect under ADGRAPH to around 8% under WEBGRAPH. Finally, we show that WEBGRAPH remains robust to a more sophisticated adversary that uses evasion techniques beyond those currently deployed on the web.

[1]  Arvind Narayanan,et al.  I never signed up for this! Privacy implications of email tracking , 2018, Proc. Priv. Enhancing Technol..

[2]  Quan Chen,et al.  Mystique: Uncovering Information Leakage from Browser Extensions , 2018, CCS.

[3]  Mohammad Reza Heidarpour,et al.  On Detecting Hidden Third-Party Web Trackers with a Wide Dependency Chain Graph: A Representation Learning Approach , 2020, ArXiv.

[4]  Arvind Narayanan,et al.  The Web Never Forgets: Persistent Tracking Mechanisms in the Wild , 2014, CCS.

[5]  Vyas Sekar,et al.  Understanding website complexity: measurements, metrics, and implications , 2011, IMC '11.

[6]  Evangelos P. Markatos,et al.  Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask , 2018, WWW.

[7]  Stephan Günnemann,et al.  Adversarial Attacks on Neural Networks for Graph Data , 2018, KDD.

[8]  Drew Davidson,et al.  Assessing Adaptive Attacks Against Trained JavaScript Classifiers , 2020, Security and Privacy in Communication Networks.

[9]  Sencun Zhu,et al.  Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Ad-blocking Systems , 2019, Internet Measurement Conference.

[10]  Benjamin Livshits,et al.  AdGraph: A Graph-Based Approach to Ad and Tracker Blocking , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[11]  University of California,et al.  Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors , 2020, 2021 IEEE Symposium on Security and Privacy (SP).

[12]  Peng Liu,et al.  A Machine Learning Approach for Detecting Third-Party Trackers on the Web , 2016, ESORICS.

[13]  Christopher Krügel,et al.  On the Privacy and Security of the Ultrasound Ecosystem , 2017, Proc. Priv. Enhancing Technol..

[14]  Athina Markopoulou,et al.  NoMoAds: Effective and Efficient Cross-App Mobile Ad-Blocking , 2018, Proc. Priv. Enhancing Technol..

[15]  Michael Pradel,et al.  Anything to Hide? Studying Minified and Obfuscated Code in the Web , 2019, WWW.

[16]  Bernhard Ager,et al.  An Automated Approach for Complementing Ad Blockers’ Blacklists , 2015, Proc. Priv. Enhancing Technol..

[17]  Hung Dang,et al.  Evading Classifiers by Morphing in the Dark , 2017, CCS.

[18]  Ben Stock,et al.  Precise Client-side Protection against DOM-based Cross-Site Scripting , 2014, USENIX Security Symposium.

[19]  Balachander Krishnamurthy,et al.  Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning , 2016, Proc. Priv. Enhancing Technol..

[20]  Benjamin Livshits,et al.  Filter List Generation for Underserved Regions , 2019, WWW.

[21]  Michael Backes,et al.  HideNoSeek: Camouflaging Malicious JavaScript in Benign ASTs , 2019, CCS.

[22]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[23]  Nikita Borisov,et al.  The Web's Sixth Sense: A Study of Scripts Accessing Smartphone Sensors , 2018, CCS.

[24]  Ben Stock,et al.  25 million flows later: large-scale detection of DOM-based XSS , 2013, CCS.

[25]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[26]  Arnar Birgisson,et al.  JSFlow: tracking information flow in JavaScript and its APIs , 2014, SAC.

[27]  Yanfang Ye,et al.  αCyber: Enhancing Robustness of Android Malware Detection System against Adversarial Attacks on Heterogeneous Graph based Model , 2019, CIKM.

[28]  K. Fukuda,et al.  Characterizing CNAME Cloaking-based Tracking on the Web , 2020, TMA.

[29]  Christo Wilson,et al.  Diffusion of User Tracking Data in the Online Advertising Ecosystem , 2018, Proc. Priv. Enhancing Technol..

[30]  Claude Castelluccia,et al.  Selling Off Privacy at Auction , 2014, NDSS 2014.

[31]  Zhiyun Qian,et al.  The ad wars: retrospective measurement and analysis of anti-adblock filter lists , 2017, Internet Measurement Conference.

[32]  Wouter Joosen,et al.  The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion , 2021, Proc. Priv. Enhancing Technol..

[33]  Cho-Jui Hsieh,et al.  Attack Graph Convolutional Networks by Adding Fake Nodes , 2018, ArXiv.

[34]  Benjamin Livshits,et al.  Who Filters the Filters: Understanding the Growth, Usefulness and Efficiency of Crowdsourced Ad Blocking , 2020, SIGMETRICS.

[35]  Chris Kanich,et al.  Leveraging Machine Learning to Improve Unwanted Resource Filtering , 2014, AISec '14.

[36]  Dan Boneh,et al.  AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning , 2019, CCS.

[37]  Peter Snyder,et al.  Detecting Filter List Evasion with Event-Loop-Turn Granularity JavaScript Signatures , 2021, 2021 IEEE Symposium on Security and Privacy (SP).

[38]  Andrew J. Kaizer,et al.  Towards Automatic Identification of JavaScript-oriented Machine-Based Tracking , 2016, IWSPA@CODASPY.

[39]  David A. Naumann,et al.  Inlined Information Flow Monitoring for JavaScript , 2015, CCS.

[40]  Patrick Th. Eugster,et al.  WebRanz: web page randomization for better advertisement delivery and web-bot prevention , 2016, SIGSOFT FSE.

[41]  Arnaud Legout,et al.  Missed by Filter Lists: Detecting Unknown Third-Party Trackers with Invisible Pixels , 2020, Proc. Priv. Enhancing Technol..