End-node Fingerprinting for Malware Detection on HTTPS Data

One of the current challenges in network intrusion detection research is the malware communicating over HTTPS protocol. Usually the task is to detect infected end-nodes with this type of malware by monitoring network traffic. The challenge lies in a very limited number of weak features that can be extracted from the network traffic capture of encrypted HTTP communication. This paper suggests a novel fingerprinting method that addresses this problem by building a higher-level end-node representation on top of the weak features. Conducted large-scale experiments on real network data show superior performance of the proposed method over the state-of-the-art solution in terms of both a lower number of produced false alarms (precision) and a higher number of detected infections (recall).

[1]  Karel Bartos,et al.  Learning Detector of Malicious Network Traffic from Weak Labels , 2015, ECML/PKDD.

[2]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[3]  Karel Bartos,et al.  Optimized Invariant Representation of Network Traffic for Detecting Unseen Malware Variants , 2016, USENIX Security Symposium.

[4]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[5]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[6]  Martin Rehák,et al.  Malware detection using HTTP user-agent discrepancy identification , 2014, 2014 IEEE International Workshop on Information Forensics and Security (WIFS).

[7]  Karel Bartos,et al.  Learning detectors of malicious web requests for intrusion detection in network traffic , 2017, ArXiv.

[8]  Jan Kohout,et al.  Automatic discovery of web servers hosting similar applications , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[9]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jakub Lokoc,et al.  k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach , 2016, PAISI.

[12]  Jakub Lokoc,et al.  Feature Extraction and Malware Detection on Large HTTPS Data Using MapReduce , 2016, SISAP.

[13]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[14]  Jan Kohout,et al.  Unsupervised detection of malware in persistent web traffic , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.