Tor traffic analysis and detection via machine learning techniques

Tor is an anonymous Internet communication system based on the second generation of onion routing network protocol. Using Tor is really difficult to trace the users Internet activity: this is the reason why the usage of Tor is intended in order to protect the privacy of users, their freedom and the ability to conduct confidential communications without being monitored. Tor is even more used by cyber-criminals in order to cover their illegal activities: the Tor community has observed, for instance an alarming increase in the number of malware that abuse of the popular anonymizing network to hide their command and control infrastructures. In this paper we present a technique able to identify whether an host is generating Tor-related traffic. We resort to well-known machine learning algorithms in order to evaluate the effectiveness of the proposed feature set in a real world environment. In addition we demonstrate that the proposed method is able to recognize the kind of activity (e.g., email or P2P applications) the user under analysis is doing on the Tor network.

[1]  Alfredo Cuzzocrea Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP , 2005, DOLAP '05.

[2]  Saint Petersburg,et al.  Second International Conference , 2001 .

[3]  Moises Goldszmidt Bayesian Network Classifiers , 2011 .

[4]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[5]  Angelos D. Keromytis,et al.  On the Effectiveness of Traffic Analysis against Anonymity Networks Using Flow Records , 2014, PAM.

[6]  Aniello Cimitile,et al.  Machine Learning Meets iOS Malware: Identifying Malicious Applications on Apple Environment , 2017, ICISSP.

[7]  Mohamed Ali Kâafar,et al.  Digging into Anonymous Traffic: A Deep Analysis of the Tor Anonymizing Network , 2010, 2010 Fourth International Conference on Network and System Security.

[8]  Ali A. Ghorbani,et al.  Characterization of Encrypted and VPN Traffic using Time-related Features , 2016, ICISSP.

[9]  Olatz Arbelaitz,et al.  Combining multiple class distribution modified subsamples in a single tree , 2007, Pattern Recognit. Lett..

[10]  Ali A. Ghorbani,et al.  Characterization of Tor Traffic using Time based Features , 2017, ICISSP.

[11]  Sushilkumar Kalmegh,et al.  Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News , 2015 .

[12]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[13]  Waseem Shahzad,et al.  Feature subset selection using association rule mining and JRip classifier , 2013 .

[14]  Tanja Zseby,et al.  Analysis of network traffic features for anomaly detection , 2014, Machine Learning.

[15]  Gerardo Canfora,et al.  A Classifier of Malicious Android Applications , 2013, 2013 International Conference on Availability, Reliability and Security.

[16]  Junzhou Luo,et al.  Inferring Application Type Information from Tor Encrypted Traffic , 2014, 2014 Second International Conference on Advanced Cloud and Big Data.

[17]  Alfredo Cuzzocrea,et al.  Analytical Synopses for Approximate Query Answering in OLAP Environments , 2004, DEXA.

[18]  Giancarlo Fortino,et al.  Managing Data and Processes in Cloud-Enabled Large-Scale Sensor Networks: State-of-the-Art and Future Research Directions , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[19]  Harish Kumar,et al.  An intrusion detection system using network traffic profiling and online sequential extreme learning machine , 2015, Expert Syst. Appl..

[20]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[21]  Dirk Grunwald,et al.  Shining Light in Dark Places: Understanding the Tor Network , 2008, Privacy Enhancing Technologies.

[22]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[23]  Diana Inkpen,et al.  Identification of Translationese: A Machine Learning Approach , 2010, CICLing.

[24]  Geoffrey I. Webb Decision Tree Grafting From the All Tests But One Partition , 1999, IJCAI.

[25]  Nikita Borisov,et al.  A Tune-up for Tor: Improving Security and Performance in the Tor Network , 2008, NDSS.

[26]  S. Sasikala,et al.  REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES , 2013, SOCO 2013.

[27]  Gene Tsudik,et al.  Towards an Analysis of Onion Routing Security , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[28]  Geoffrey Glassock,et al.  Eighth International Conference , 2008 .

[29]  Peter Hannay,et al.  Using Traffic Analysis to Identify the Second Generation Onion Router , 2011, 2011 IFIP 9th International Conference on Embedded and Ubiquitous Computing.