Malware Detection by HTTPS Traffic Analysis

In order to evade detection by network-traffic analysis, a growing proportion of malware uses the encrypted HTTPS protocol. We explore the problem of detecting malware on client computers based on HTTPS traffic analysis. In this setting, malware has to be detected based on the host IP address, ports, timestamp, and data volume information of TCP/IP packets that are sent and received by all the applications on the client. We develop a scalable protocol that allows us to collect network flows of known malicious and benign applications as training data and derive a malware-detection method based on a neural networks and sequence classification. We study the method's ability to detect known and new, unknown malware in a large-scale empirical study.

[1]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[2]  Roberto Perdisci,et al.  ExecScent: Mining for New C&C Domains in Live Networks with Adaptive Control Protocol Templates , 2013, USENIX Security Symposium.

[3]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[4]  Jan Kohout,et al.  Automatic discovery of web servers hosting similar applications , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[5]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[6]  Karel Bartos,et al.  Robust Representation for Domain Adaptation in Network Security , 2015, ECML/PKDD.

[7]  Felix C. Freiling,et al.  Toward Automated Dynamic Malware Analysis Using CWSandbox , 2007, IEEE Secur. Priv..

[8]  Jakub Lokoc,et al.  k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach , 2016, PAISI.

[9]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[10]  Maurizio Dusi,et al.  Tunnel Hunter: Detecting application-layer tunnels with statistical fingerprinting , 2009, Comput. Networks.

[11]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.

[14]  Jan Kohout,et al.  Unsupervised detection of malware in persistent web traffic , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Karel Bartos,et al.  Learning Detector of Malicious Network Traffic from Weak Labels , 2015, ECML/PKDD.

[16]  Divya Bansal,et al.  Malware Analysis and Classification: A Survey , 2014 .

[17]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[18]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[19]  Benoit Claise,et al.  Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information , 2013, RFC.