A Model for Detecting Tor Encrypted Traffic using Supervised Machine Learning

Tor is the low-latency anonymity tool and one of the prevalent used open source anonymity tools for anonymizing TCP traffic on the Internet used by around 500,000 people every day. Tor protects user's privacy against surveillance and censorship by making it extremely difficult for an observer to correlate visited websites in the Internet with the real physical-world identity. Tor accomplished that by ensuring adequate protection of Tor traffic against traffic analysis and feature extraction techniques. Further, Tor ensures anti- website fingerprinting by implementing different defences like TLS encryption, padding, and packet relaying. However, in this paper, an analysis has been performed against Tor from a local observer in order to bypass Tor protections; the method consists of a feature extraction from a local network dataset. Analysis shows that it's still possible for a local observer to fingerprint top monitored sites on Alexa and Tor traffic can be classified amongst other HTTPS traffic in the network despite the use of Tor's protections. In the experiment, several supervised machine-learning algorithms have been employed. The attack assumes a local observer sitting on a local network fingerprinting top 100 sites on Alexa; results gave an improvement amongst previous results by achieving an accuracy of 99.64% and 0.01% false positive.

[1]  Tao Wang,et al.  Improved website fingerprinting on Tor , 2013, WPES.

[2]  Eric Chan-Tin,et al.  Revisiting Circuit Clogging Attacks on Tor , 2013, 2013 International Conference on Availability, Reliability and Security.

[3]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[4]  A. Nur Zincir-Heywood,et al.  An investigation on identifying SSL traffic , 2011, 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[5]  Peter Hannay,et al.  Using Traffic Analysis to Identify the Second Generation Onion Router , 2011, 2011 IFIP 9th International Conference on Embedded and Ubiquitous Computing.

[6]  Marek Klonowski,et al.  Onions Based on Universal Re-encryption - Anonymous Communication Immune Against Repetitive Attack , 2004, WISA.

[7]  Ian H. Witten,et al.  Web Dragons: Inside the Myths of Search Engine Technology , 2006 .

[8]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[9]  John R. Vacca Computer and Information Security Handbook , 2009 .

[10]  Manuel Mogollon,et al.  Cryptography and Security Services: Mechanisms and Applications , 2007 .

[11]  Weijia Jia,et al.  A New Cell-Counting-Based Attack Against Tor , 2012, IEEE/ACM Transactions on Networking.

[12]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[13]  Vijay Srinivas Agneeswaran Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives , 2014 .

[14]  Riyad Alshammari,et al.  Machine learning based encrypted traffic classification: Identifying SSH and Skype , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[15]  George Bebis,et al.  An Analysis of Anonymizer Technology Usage , 2011, TMA.

[16]  Eddie Schwalb iTV handbook: technologies & standards , 2004, CIE.

[17]  Yong Zhang,et al.  Traffic Identification of Tor and Web-Mix , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[18]  Michael Ligh,et al.  Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting Malicious Code , 2010 .

[19]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[20]  Peter Loshin Practical Anonymity: Hiding in Plain Sight Online , 2013 .

[21]  Lior Rokach,et al.  Introduction to Supervised Methods , 2005, The Data Mining and Knowledge Discovery Handbook.