Detection of Drive-by Download Attacks Using Machine Learning Approach

Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.

[1]  Deborah A. Frincke,et al.  Drive-by-Downloads , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[2]  Monther Aldwairi,et al.  MALURLs: Malicious URLs Classification System , 2011 .

[3]  Monther Aldwairi,et al.  Exscind: Fast pattern matching for intrusion detection using exclusion and inclusion filters , 2011, 2011 7th International Conference on Next Generation Web Services Practices.

[4]  Marius Zbancioc,et al.  Speech emotion recognition for SROL database using weighted KNN algorithm , 2013, Proceedings of the International Conference on ELECTRONICS, COMPUTERS and ARTIFICIAL INTELLIGENCE - ECAI-2013.

[5]  Ayumu Kubota,et al.  An Approach to Detect Drive-By Download by Observing the Web Page Transition Behaviors , 2014, 2014 Ninth Asia Joint Conference on Information Security.

[6]  Muhammad Arif,et al.  Decision Trees Based Classification of Cardiotocograms Using Bagging Approach , 2015, 2015 13th International Conference on Frontiers of Information Technology (FIT).

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Marco Cova,et al.  HARMUR: storing and analyzing historic data on malicious domains , 2011, BADGERS '11.

[9]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[10]  Monther Aldwairi,et al.  MALURLS: A Lightweight Malicious Website Classification Based on URL Features , 2012 .

[11]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[12]  Toramatsu Shintani,et al.  Preventing Fake Web Pages Using Push Delivery - Defending against Theft Crawlers , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[13]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[14]  Yuta Takata,et al.  MineSpider: Extracting URLs from Environment-Dependent Drive-by Download Attacks , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[15]  P. R. Lakshmi Eswari,et al.  Browser JS Guard: Detects and defends against Malicious JavaScript injection based drive by download attacks , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[16]  Ciza Thomas,et al.  A static approach to detect drive-by-download attacks on webpages , 2013, 2013 International Conference on Control Communication and Computing (ICCC).

[17]  Monther Aldwairi,et al.  Application of artificial bee colony for intrusion detection systems , 2015, Secur. Commun. Networks.

[18]  David Harley,et al.  Drive-by downloads from the trenches , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[19]  Ayumu Kubota,et al.  Detecting and Preventing Drive-By Download Attack via Participative Monitoring of the Web , 2013, 2013 Eighth Asia Joint Conference on Information Security.

[20]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[21]  Ian Welch,et al.  Detecting heap-spray attacks in drive-by downloads: Giving attackers a hand , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[22]  Hiroshi Ishii,et al.  Automated Detection of Drive-By Download Attack , 2015, 2015 9th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.