Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier

Privacy enhancing technologies like OpenSSL, OpenVPN or Tor establish an encrypted tunnel that enables users to hide content and addresses of requested websites from external observers This protection is endangered by local traffic analysis attacks that allow an external, passive attacker between the PET system and the user to uncover the identity of the requested sites. However, existing proposals for such attacks are not practicable yet. We present a novel method that applies common text mining techniques to the normalised frequency distribution of observable IP packet sizes. Our classifier correctly identifies up to 97% of requests on a sample of 775 sites and over 300,000 real-world traffic dumps recorded over a two-month period. It outperforms previously known methods like Jaccard's classifier and Naïve Bayes that neglect packet frequencies altogether or rely on absolute frequency values, respectively. Our method is system-agnostic: it can be used against any PET without alteration. Closed-world results indicate that many popular single-hop and even multi-hop systems like Tor and JonDonym are vulnerable against this general fingerprinting attack. Furthermore, we discuss important real-world issues, namely false alarms and the influence of the browser cache on accuracy.

[1]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[2]  Charles V. Wright,et al.  Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis , 2009, NDSS.

[3]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[4]  Renato Lo Cigno,et al.  Traffic Flow Confidentiality in IPsec: Protocol and Implementation , 2007, FIDIS.

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  H. Cheng,et al.  Traffic Analysis of SSL Encrypted Web Browsing , 1998 .

[7]  Nikita Borisov,et al.  Privacy Enhancing Technologies, 7th International Symposium, PET 2007 Ottawa, Canada, June 20-22, 2007, Revised Selected Papers , 2007, Privacy Enhancing Technologies.

[8]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[9]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[10]  Andrew Hintz,et al.  Fingerprinting Websites Using Traffic Analysis , 2002, Privacy Enhancing Technologies.

[11]  Wallace Koehler,et al.  An Analysis of Web Page and Web Site Constancy and Permanence , 1999, J. Am. Soc. Inf. Sci..

[12]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[13]  Cornelia Boldyreff,et al.  The evolution of Websites , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[14]  Andrew W. Moore,et al.  Traffic Classification Using a Statistical Approach , 2005, PAM.

[15]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[16]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .

[17]  Andriy Panchenko,et al.  Towards Practical Attacker Classification for Risk Analysis in Anonymous Communication , 2006, Communications and Multimedia Security.

[18]  Dirk Grunwald,et al.  Low-resource routing attacks against tor , 2007, WPES '07.

[19]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[20]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[21]  George Danezis,et al.  Low-cost traffic analysis of Tor , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[22]  Hannes Federrath,et al.  Performance Comparison of Low-Latency Anonymisation Services from a User Perspective , 2007, Privacy Enhancing Technologies.

[23]  Eric C. Price,et al.  Browser-Based Attacks on Tor , 2007, Privacy Enhancing Technologies.

[24]  David D. Jensen,et al.  Privacy Vulnerabilities in Encrypted HTTP Streams , 2005, Privacy Enhancing Technologies.

[25]  Hannes Federrath,et al.  Web MIXes: A System for Anonymous and Unobservable Internet Access , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[26]  Spyros Antonatos,et al.  On the Privacy Risks of Publishing Anonymized IP Network Traces , 2006, Communications and Multimedia Security.

[27]  Wallace Koehler,et al.  Web page change and persistence - A four-year longitudinal study , 2002, J. Assoc. Inf. Sci. Technol..

[28]  Ari Huttunen,et al.  UDP Encapsulation of IPsec ESP Packets , 2005, RFC.

[29]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[30]  Bart Preneel,et al.  Towards Measuring Anonymity , 2002, Privacy Enhancing Technologies.

[31]  Tatu Ylönen,et al.  The Secure Shell (SSH) Connection Protocol , 2006, RFC.

[32]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[33]  Charles V. Wright,et al.  On Web Browsing Privacy in Anonymized NetFlows , 2007, USENIX Security Symposium.

[34]  Jean-François Raymond,et al.  Traffic Analysis: Protocols, Attacks, Design Issues, and Open Problems , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[35]  Roger Dingledine,et al.  Privacy enhancing technologies : Second International Workshop, PET 2002, San Francisco, CA, USA, April 14-15, 2002 : revised papers , 2003 .