Learning Invariant Representation for Malicious Network Traffic Detection

Statistical learning theory relies on an assumption that the joint distributions of observations and labels are the same in training and testing data. However, this assumption is violated in many real world problems, such as training a detector of malicious network traffic that can change over time as a result of attacker’s detection evasion efforts. We propose to address this problem by creating an optimized representation, which significantly increases the robustness of detectors or classifiers trained under this distributional shift. The representation is created from bags of samples (e.g. network traffic logs) and is designed to be invariant under shifting and scaling of the feature values extracted from the logs and under permutation and size changes of the bags. The invariance is achieved by combining feature histograms with feature self-similarity matrices computed for each bag and significantly reduces the difference between the training and testing data. The parameters of the representation, such as histogram bin boundaries, are learned jointly with the classifier. We show that the representation is effective for training a detector of malicious traffic, achieving 90% precision and 67% recall on samples of previously unseen malware variants.

[1]  Christopher Krügel,et al.  Anomaly detection of web-based attacks , 2003, CCS '03.

[2]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[4]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[5]  Jaime S. Cardoso,et al.  Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part III , 2015 .

[6]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[7]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[8]  João Gama,et al.  Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part II , 2015, ECML/PKDD.

[9]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[10]  Sunita Sarawagi,et al.  Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection , 2014, ICML.

[11]  Hongjun Lu,et al.  Cut-and-Pick Transactions for Proxy Log Mining , 2002, EDBT.

[12]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[13]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Christopher Krügel,et al.  Nazca: Detecting Malware Distribution in Large-Scale Networks , 2014, NDSS.

[15]  Meinard Müller,et al.  Transposition-Invariant Self-Similarity Matrices , 2007, ISMIR.