Tracking User Application Activity by using Machine Learning Techniques on Network Traffic

A network eavesdropper may invade the privacy of an online user by collecting the passing traffic and classifying the applications that generated the network traffic. This collection may be used to build fingerprints of the user’s Internet usage. In this paper, we investigate the feasibility of performing such breach on encrypted network traffic generated by actual users. We adopt the random forest algorithm to classify the applications in use by users of a campus network. Our classification system identifies and quantifies different statistical features of user’s network traffic to classify applications rather than looking into packet contents. In addition, application classification is performed without employing a port mapping at the transport layer. Our results show that applications can be identified with an average precision and recall of up to 99%.

[1]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[2]  Luís Bernardo,et al.  Machine Learning in Software Defined Networks: Data collection and traffic classification , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).

[3]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[6]  Anil Kumar Sharma,et al.  An Effective DoS Prevention System to Analysis and Prediction of Network Traffic Using Support Vector Machine Learning , 2013 .

[7]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Colin J. Fidge,et al.  A Comparison of Supervised Machine Learning Algorithms for Classification of Communications Network Traffic , 2017, ICONIP.

[9]  Guanglu Sun,et al.  Internet Traffic Classification Based on Incremental Support Vector Machines , 2018, Mob. Networks Appl..

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[12]  Fan Zhang,et al.  Inferring users' online activities through traffic analysis , 2011, WiSec '11.

[13]  Marwan A. Al-Namari,et al.  Internet traffic classification using machine learning approach: Datasets validation issues , 2016, 2016 Conference of Basic Sciences and Engineering Studies (SGCAC).

[14]  Ran Liu,et al.  Investigation of machine learning based network traffic classification , 2017, 2017 International Symposium on Wireless Communication Systems (ISWCS).

[15]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[16]  Mahdi Jafari Siavoshani,et al.  Deep packet: a novel approach for encrypted traffic classification using deep learning , 2017, Soft Computing.

[17]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[18]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[19]  Zigang Cao,et al.  A Survey on Encrypted Traffic Classification , 2014 .