Learning from Time Series with Outlier Correction for Malicious Domain Identification

Malicious domain identification is an important task in the field of cyberspace security. However, most of existing work for this task heavily relies on expert experience when constructing machine learning features. What makes matters worse is that these features can be deliberately changed by attackers. As a result, such malicious domain identification methods are easily bypassed by cyber criminals. To solve this problem, in this paper, we propose a novel method for malicious domain identification by effectively learning time series shapelets, the discriminative local patterns of time series. More specifically, our method consists of two main components: 1) modeling user's habits of accessing domains by learning shapelets from domain time series. As the domain time series is generated by the crowd visiting websites, the learned user's habits of accessing domains can potentially reflect what type of service a domain provides, such as pornography, gambling and so on. 2) an outlier correction algorithm designed for a single time series and independent of the model which can enhance the robustness of shapelet initialization. We integrate shapelet learning and outlier correction in our model. Extensive experiments on real-world dataset demonstrates that our proposed method has better performance compared with state-of-the-art methods.

[1]  Sandeep Yadav,et al.  Detecting Malicious Domains via Graph Inference , 2014, AISec '14.

[2]  Masayuki Murata,et al.  Malicious URL sequence detection using event de-noising convolutional neural network , 2017, 2017 IEEE International Conference on Communications (ICC).

[3]  Peng Zhang,et al.  DomainObserver: A Lightweight Solution for Detecting Malicious Domains Based on Dynamic Time Warping , 2018, ICCS.

[4]  Eamonn J. Keogh,et al.  Scalable Clustering of Time Series with U-Shapelets , 2015, SDM.

[5]  Sungju Lee,et al.  Shapelets-Based Intrusion Detection for Protection Traffic Flooding Attacks , 2018, DASFAA Workshops.

[6]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[7]  Steven C. H. Hoi,et al.  Malicious URL Detection using Machine Learning: A Survey , 2017, ArXiv.

[8]  Danny Hendler,et al.  Detection of malicious webmail attachments based on propagation patterns , 2018, Knowl. Based Syst..

[9]  Om Patri,et al.  Discovering Malware with Time Series Shapelets , 2017, HICSS.

[10]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[11]  Daniel Gibert,et al.  Classification of Malware by Using Structural Entropy on Convolutional Neural Networks , 2018, AAAI.

[12]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[13]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[14]  Jing Tao,et al.  DNSRadar: Outsourcing Malicious Domain Detection Based on Distributed Cache-Footprints , 2014, IEEE Transactions on Information Forensics and Security.

[15]  Hans D. Schotten,et al.  Time is of the Essence: Machine Learning-Based Intrusion Detection in Industrial Time Series Data , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[16]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[17]  Jian Pei,et al.  Malicious URL detection by dynamically mining patterns without pre-defined elements , 2013, World Wide Web.

[18]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.