A Cause-Based Classification Approach for Malicious DNS Queries Detected Through Blacklists

Some of the most serious security threats facing computer networks involve malware. To prevent this threat, administrators need to swiftly remove the infected machines from their networks. One common way to detect infected machines in a network is by monitoring communications based on blacklists. However, detection using this method has the following two problems: no blacklist is completely reliable, and blacklists do not provide sufficient evidence to allow administrators to determine the validity and accuracy of the detection results. Therefore, simply matching communications with blacklist entries is insufficient, and administrators should pursue their detection causes by investigating the communications themselves. In this paper, we propose an approach for classifying malicious DNS queries detected through blacklists by their causes. This approach is motivated by the following observation: a malware communication is divided into several transactions, each of which generates queries related to the malware; thus, surrounding queries that occur before and after a malicious query detected through blacklists help in estimating the cause of the malicious query. Our cause-based classification drastically reduces the number of malicious queries to be investigated because the investigation scope is limited to only representative queries in the classification results. In experiments, we have confirmed that our approach could group 388 malicious queries into 3 clusters, each consisting of queries with a common cause. These results indicate that administrators can briefly pursue all the causes by investigating only representative queries of each cluster, and thereby swiftly address the problem of infected machines in the network.

[1]  Andrew Berns Searching for Malware in BitTorrent , 2008 .

[2]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[3]  Wilfried N. Gansterer,et al.  Mining agile DNS traffic using graph analysis for cybercrime detection , 2016, Comput. Networks.

[4]  C. Dwyer,et al.  Malvertising - A Rising Threat To The Online Ecosystem , 2017 .

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Katsuyoshi Iida,et al.  Analysis of DNS TXT Record Usage and Consideration of Botnet Communication Detection , 2018, IEICE Trans. Commun..

[7]  Kai Chen,et al.  A Large Scale Analysis of DNS Water Torture Attack , 2018, CSAI '18.

[8]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[9]  Tommy Chin,et al.  A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection , 2019, IEEE Access.

[10]  Wannes Meert,et al.  Query Log Analysis: Detecting Anomalies in DNS Traffic at a TLD Resolver , 2018, DMLE/IOTSTREAMING@PKDD/ECML.

[11]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[12]  Paul Barford,et al.  Context-aware clustering of DNS query traffic , 2008, IMC '08.

[13]  Vernon Schryver,et al.  DNS Response Policy Zones (RPZ) , 2016 .

[14]  Kaichao Wu,et al.  Data Mining-based DNS Log Analysis , 2014 .

[15]  Felix C. Freiling,et al.  On Botnets That Use DNS for Command and Control , 2011, 2011 Seventh European Conference on Computer Network Defense.

[16]  Andrejs Romanovs,et al.  Why SIEM is Irreplaceable in a Secure IT Environment? , 2019, 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream).

[17]  Nizar Kheir,et al.  Mentor: Positive DNS Reputation to Skim-Off Benign Domains in Botnet C&C Blacklists , 2014, SEC.

[18]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[19]  Arturo Azcorra,et al.  TorrentGuard: Stopping scam and malware distribution in the BitTorrent ecosystem , 2014, Comput. Networks.

[20]  Stanislav Špaček,et al.  Current Issues of Malicious Domains Blocking , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[21]  Babak Rahbarinia,et al.  Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks , 2016, ACM Trans. Priv. Secur..

[22]  Wei Li,et al.  Can We Learn what People are Doing from Raw DNS Queries? , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[23]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[24]  Yutaka Nakamura,et al.  Clustering Malicious DNS Queries for Blacklist-Based Detection , 2019, IEICE Trans. Inf. Syst..

[25]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[26]  Athina Markopoulou,et al.  Blacklisting Recommendation System: Using Spatio-Temporal Patterns to Predict Future Attacks , 2011, IEEE Journal on Selected Areas in Communications.

[27]  Yacin Nadji,et al.  Towards designing effective visualizations for DNS-based network threat analysis , 2017, 2017 IEEE Symposium on Visualization for Cyber Security (VizSec).

[28]  Hui-Tang Lin,et al.  DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis , 2017, Comput. Secur..

[29]  Takeshi Ikenaga,et al.  Estimating the Randomness of Domain Names for DGA Bot Callbacks , 2018, IEEE Communications Letters.

[30]  Christian Rossow,et al.  RUHR-UNIVERSITÄT BOCHUM , 2014 .

[31]  Steven C. H. Hoi,et al.  Malicious URL Detection using Machine Learning: A Survey , 2017, ArXiv.

[32]  Zahra Behfarshad Survey of Malware Distribution Networks , 2012 .

[33]  B. Wu,et al.  Detecting APT Malware Infections Based on Malicious DNS and Traffic Analysis , 2015, IEEE Access.

[34]  Mourad Debbabi,et al.  Detecting Internet Abuse by Analyzing Passive DNS Traffic: A Survey of Implemented Systems , 2018, IEEE Communications Surveys & Tutorials.

[35]  Stanislav Špaček,et al.  DNS Firewall Data Visualization , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[36]  Yang Zhang,et al.  Domain Watcher: Detecting Malicious Domains Based on Local and Global Textual Features , 2017, ICCS.

[37]  Adam Kozakiewicz,et al.  Analysis of the Similarities in Malicious DNS Domain Names , 2011 .

[38]  Mitsuaki Akiyama,et al.  Automating URL Blacklist Generation with Similarity Search Approach , 2016, IEICE Trans. Inf. Syst..

[39]  Yizheng Chen,et al.  Enabling Network Security Through Active DNS Datasets , 2016, RAID.

[40]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[41]  Tudor Dumitras,et al.  When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.