Empirical analysis of factors affecting malware URL detection

Many organizations, from antivirus companies to motivated volunteers, maintain blacklists of URLs suspected of distributing malware in order to protect users. Detection rates can vary widely, but it is not known why. We posit that much variation can be explained by differences in the type of malware and differences in the blacklists themselves. To that end, we conducted an empirical analysis of 722 malware URLs submitted to the Malware Domain List (MDL) over 6 months in 2012-2013. We ran each URL through VirusTotal, a tool that allowed us to check each URL against 38 different malware URL blacklists, within an hour from when they were first blacklisted by the MDL. We followed up on each for two weeks following. We then ran logisitic regressions and Cox proportional hazard models to identify factors affecting blacklist accuracy and speed. We find that URLs belonging to known exploit kits such as Blackhole and Styx were more likely to be blacklisted and blacklisted quicker. We also found that blacklists that are used to actively block URLs are more effective than those that do not, and furthermore that paid services are more effective than free ones.

[1]  Ramana Rao Kompella,et al.  PhishNet: Predictive Blacklisting to Detect Phishing Attacks , 2010, 2010 Proceedings IEEE INFOCOM.

[2]  Thorsten Holz,et al.  An Empirical Analysis of Malware Blacklists , 2012, PIK Prax. Informationsverarbeitung Kommun..

[3]  D.,et al.  Regression Models and Life-Tables , 2022 .

[4]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[5]  Farnam Jahanian,et al.  CloudAV: N-Version Antivirus in the Network Cloud , 2008, USENIX Security Symposium.

[6]  Ilir Gashi,et al.  Does Malware Detection Improve with Diverse AntiVirus Products? An Empirical Study , 2013, SAFECOMP.

[7]  Martín Abadi,et al.  deSEO: Combating Search-Result Poisoning , 2011, USENIX Security Symposium.

[8]  Tyler Moore,et al.  Fashion crimes: trending-term exploitation on the web , 2011, CCS '11.

[9]  Olivier Thonnard,et al.  An Experimental Study of Diversity with Off-the-Shelf AntiVirus Engines , 2009, 2009 Eighth IEEE International Symposium on Network Computing and Applications.

[10]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.