Detecting Malware Based on DNS Graph Mining

Malware remains a major threat to nowadays Internet. In this paper, we propose a DNS graph mining-based malware detection approach. A DNS graph is composed of DNS nodes, which represent server IPs, client IPs, and queried domain names in the process of DNS resolution. After the graph construction, we next transform the problem of malware detection to the graph mining task of inferring graph nodes' reputation scores using the belief propagation algorithm. The nodes with lower reputation scores are inferred as those infected by malwares with higher probability. For demonstration, we evaluate the proposed malware detection approach with real-world dataset. Our real-world dataset is collected from campus DNS servers for three months and we built a DNS graph consisting of 19,340,820 vertices and 24,277,564 edges. On the graph, we achieve a true positive rate 80.63% with a false positive rate 0.023%. With a false positive of 1.20%, the true positive rate was improved to 95.66%. We detected 88,592 hosts infected by malware or C&C servers, accounting for the percentage of 5.47% among all hosts. Meanwhile, 117,971 domains are considered to be related to malicious activities, accounting for 1.5% among all domains. The results indicate that our method is efficient and effective in detecting malwares.

[1]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[2]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[3]  Jun Li,et al.  On the state of IP spoofing defense , 2009, TOIT.

[4]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[5]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[6]  Radu State,et al.  RiskRank: Security risk ranking for IP flow records , 2010, 2010 International Conference on Network and Service Management.

[7]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[8]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[9]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[10]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[11]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[12]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[13]  Thorsten Holz,et al.  As the net churns: Fast-flux botnet observations , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[14]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[15]  Radu State,et al.  BotCloud: Detecting botnets using MapReduce , 2011, 2011 IEEE International Workshop on Information Forensics and Security.

[16]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[17]  Florian Weimer,et al.  Passive DNS Replication , 2005 .

[18]  Vinod Yegneswaran,et al.  An empirical reexamination of global DNS behavior , 2013, SIGCOMM.

[19]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining and Inference for Malware Detection , 2011 .

[20]  Felix C. Freiling,et al.  On Botnets That Use DNS for Command and Control , 2011, 2011 Seventh European Conference on Computer Network Defense.