Profiling Tor Users with Unsupervised Learning Techniques

Website fingerprinting has been shown to be effective against Tor, one of the most popular low-latency anonymity networks. With this attack, a local network adversary is able to recover the browsing history of a client by using the traffic fingerprints observed at the client’s connection to the Tor network. Previous studies on website fingerprinting focus on designing supervised classifiers to identify visits to a set of target websites. In this paper, we consider an adversary with the same capabilities as in website fingerprinting, but who uses unsupervised techniques to profile the users’ browsing activity. We have used OPTICS, a clustering algorithm, to group similar traffic samples together, and the BCubed Precision and Recall metrics to measure the quality of the clustering. For a world of 100 websites, we show that, under mild assumptions, the attacker is able to group visits of different users to the same site with more than 50% success rate. We have also evaluated how the number of different pages that users can access impacts the effectiveness of the attack and found that for a world of 1,000 pages, the attack performance does not suffer a significant reduction.

[1]  Rachel Greenstadt,et al.  A Critical Evaluation of Website Fingerprinting Attacks , 2014, CCS.

[2]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[3]  Tao Wang,et al.  Improved website fingerprinting on Tor , 2013, WPES.

[4]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[5]  Xiang Cai,et al.  CS-BuFLO: A Congestion Sensitive Website Fingerprinting Defense , 2014, WPES.

[6]  George Danezis,et al.  Website fingerprinting at scale , 2015 .

[7]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[8]  Tao Wang,et al.  Effective Attacks and Provable Defenses for Website Fingerprinting , 2014, USENIX Security Symposium.

[9]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[10]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[11]  Arthur Zimek,et al.  A Framework for Clustering Uncertain Data , 2015, Proc. VLDB Endow..

[12]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[13]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[14]  Peter Grabusts,et al.  The Choice of Metrics for Clustering Algorithms , 2015 .

[15]  Thomas Ristenpart,et al.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail , 2012, 2012 IEEE Symposium on Security and Privacy.

[16]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[17]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[18]  Mike Perry,et al.  WTF-PAD: Toward an Efficient Website Fingerprinting Defense for Tor , 2015, ArXiv.

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Brijesh Joshi,et al.  Touching from a distance: website fingerprinting attacks and defenses , 2012, CCS.

[21]  Hannes Federrath,et al.  Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier , 2009, CCSW '09.

[22]  Klaus Wehrle,et al.  Website Fingerprinting at Internet Scale , 2016, NDSS.

[23]  H. Cheng,et al.  Traffic Analysis of SSL Encrypted Web Browsing , 1998 .

[24]  Se-Hak Chun Privacy Enhancing Technologies (PETs) and Investment Strategies for a Data Market , 2015 .

[25]  Steven J. Murdoch,et al.  Do You See What I See? Differential Treatment of Anonymous Users , 2016, NDSS.

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Ian Goldberg,et al.  Changing of the guards: a framework for understanding and improving entry guard selection in tor , 2012, WPES '12.