Detecting fake anti-virus software distribution webpages

Attackers are continually seeking novel methods to distribute malware. Among various approaches, fake Anti-Virus (AV) attacks represent an active trend for malware distribution. In a fake AV attack, attackers disguise malware as legitimate anti-virus software and convince users to install it. As web browsers become the most popular applications for users to access online resources, webpages have become the dominating means to launch fake AV attacks. In this paper, we presented an automated and effective detection system, namely DART, to identify fake AV webpages in the Internet. We proposed a collection of novel features to characterize an unknown webpage and then integrate them using statistical classifiers. These features focus on profiling a fake AV webpage from three aspects that are fundamentally important for its success, thereby resulting in the high detection accuracy and implying resistance against evasion attempts. We have performed extensive evaluation based on real fake AV webpages that are collected from the Internet. Experimental results have demonstrated that DART can accomplish a high detection rate of 90.4% at an extremely low false positive rate of 0.2%.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  I. Jolliffe Principal Component Analysis , 2002 .

[3]  Arnold W. M. Smeulders,et al.  Color-based object recognition , 1997, Pattern Recognit..

[4]  Sandeep Yadav,et al.  Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis , 2012, IEEE/ACM Transactions on Networking.

[5]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[6]  Patrick Traynor,et al.  VulnerableMe: Measuring Systemic Weaknesses in Mobile Browser Security , 2012, ICISS.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Yong Wang,et al.  Extracting user web browsing patterns from non-content network traces: The online advertising case study , 2012, Comput. Networks.

[9]  Benjamin G. Zorn,et al.  Zozzle: Low-overhead Mostly Static JavaScript Malware Detection , 2010 .

[10]  Norbert Pohlmann,et al.  Exploiting visual appearance to cluster and detect rogue software , 2013, SAC '13.

[11]  Wenke Lee,et al.  ARROW: GenerAting SignatuRes to Detect DRive-By DOWnloads , 2011, WWW.

[12]  Xin Zhao,et al.  The Nocebo Effect on the Web: An Analysis of Fake Anti-Virus Distribution , 2010, LEET.

[13]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[14]  Peter Kolb,et al.  DISCO: A Multilingual Database of Distributionally Similar Words , 2008 .

[15]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[16]  Christopher Krügel,et al.  The Underground Economy of Fake Antivirus Software , 2011, WEIS.

[17]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[18]  Martín Abadi,et al.  deSEO: Combating Search-Result Poisoning , 2011, USENIX Security Symposium.

[19]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[20]  S. Dumais Latent Semantic Analysis. , 2005 .

[21]  John C. Platt,et al.  Robust scareware image detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Christopher Krügel,et al.  A layout-similarity-based approach for detecting phishing pages , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[25]  Vinod Yegneswaran,et al.  BLADE: an attack-agnostic approach for preventing drive-by malware infections , 2010, CCS '10.