Optimizing Precision for Open-World Website Fingerprinting

Traffic analysis attacks to identify which web page a client is browsing, using only her packet metadata --- known as website fingerprinting --- has been proven effective in closed-world experiments against privacy technologies like Tor. However, due to the base rate fallacy, these attacks have failed in large open-world settings against clients that visit sensitive pages with a low base rate. We find that this is because they have poor precision as they were designed to maximize recall. In this work, we argue that precision is more important than recall for open-world website fingerprinting. For this reason, we develop three classes of {\em precision optimizers}, based on confidence, distance, and ensemble learning, that can be applied to any classifier to increase precision. We test them on known website fingerprinting attacks and show significant improvements in precision. Against a difficult scenario, where the attacker wants to monitor and distinguish 100 sensitive pages each with a low mean base rate of 0.00001, our best optimized classifier can achieve a precision of 0.78; the highest precision of any known attack before optimization was 0.014. We use precise classifiers to tackle realistic objectives in website fingerprinting, including selection, identification, and defeating website fingerprinting defenses.

[1]  George Danezis,et al.  k-fingerprinting: A Robust Scalable Website Fingerprinting Technique , 2015, USENIX Security Symposium.

[2]  Klaus Wehrle,et al.  Website Fingerprinting at Internet Scale , 2016, NDSS.

[3]  Xiang Cai,et al.  Glove: A Bespoke Website Fingerprinting Defense , 2014, WPES.

[4]  Brijesh Joshi,et al.  Touching from a distance: website fingerprinting attacks and defenses , 2012, CCS.

[5]  Tao Wang,et al.  Website Fingerprinting: Attacks and Defenses , 2016 .

[6]  Tao Wang,et al.  A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses , 2014, CCS.

[7]  Tao Wang,et al.  On Realistically Attacking Tor with Website Fingerprinting , 2016, Proc. Priv. Enhancing Technol..

[8]  David D. Jensen,et al.  Privacy Vulnerabilities in Encrypted HTTP Streams , 2005, Privacy Enhancing Technologies.

[9]  J. Koehler,et al.  The Coming Paradigm Shift in Forensic Identification Science , 2005, Science.

[10]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  H. Cheng,et al.  Traffic Analysis of SSL Encrypted Web Browsing , 1998 .

[13]  Tao Wang,et al.  Effective Attacks and Provable Defenses for Website Fingerprinting , 2014, USENIX Security Symposium.

[14]  Andrew Hintz,et al.  Fingerprinting Websites Using Traffic Analysis , 2002, Privacy Enhancing Technologies.

[15]  Hannes Federrath,et al.  Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier , 2009, CCSW '09.

[16]  Tao Wang,et al.  Walkie-Talkie: An Efficient Defense Against Passive Website Fingerprinting Attacks , 2017, USENIX Security Symposium.

[17]  Ian Goldberg,et al.  Changing of the guards: a framework for understanding and improving entry guard selection in tor , 2012, WPES '12.

[18]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[19]  Tao Wang,et al.  Improved website fingerprinting on Tor , 2013, WPES.

[20]  Mike Perry,et al.  Toward an Efficient Website Fingerprinting Defense , 2015, ESORICS.

[21]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[22]  Rachel Greenstadt,et al.  A Critical Evaluation of Website Fingerprinting Attacks , 2014, CCS.

[23]  Mun Choon Chan,et al.  Website Fingerprinting and Identification Using Ordered Feature Sequences , 2010, ESORICS.

[24]  J. Swets ROC analysis applied to the evaluation of medical imaging techniques. , 1979, Investigative radiology.

[25]  R. Dingledine,et al.  One Fast Guard for Life ( or 9 months ) , 2014 .

[26]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[27]  Tadeusz Pietraszek,et al.  Using Adaptive Alert Classification to Reduce False Positives in Intrusion Detection , 2004, RAID.

[28]  Law. Policy Executive Summary of the National Academies of Science Reports, Strengthening Forensic Science in the United States: A Path Forward , 2009 .