Malicious URL Detection Based on Associative Classification

Cybercriminals use malicious URLs as distribution channels to propagate malware over the web. Attackers exploit vulnerabilities in browsers to install malware to have access to the victim’s computer remotely. The purpose of most malware is to gain access to a network, ex-filtrate sensitive information, and secretly monitor targeted computer systems. In this paper, a data mining approach known as classification based on association (CBA) to detect malicious URLs using URL and webpage content features is presented. The CBA algorithm uses a training dataset of URLs as historical data to discover association rules to build an accurate classifier. The experimental results show that CBA gives comparable performance against benchmark classification algorithms, achieving 95.8% accuracy with low false positive and negative rates.

[1]  Fadi A. Thabtah,et al.  Phishing detection based Associative Classification data mining , 2014, Expert Syst. Appl..

[2]  HongTzung-Pei,et al.  Classification based on association rules , 2012 .

[3]  Fadi A. Thabtah,et al.  A review of associative classification mining , 2007, The Knowledge Engineering Review.

[4]  Wa’el Hadi,et al.  A new fast associative classification algorithm for detecting phishing websites , 2016, Appl. Soft Comput..

[5]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[6]  Dohoon Kim,et al.  WebMon: ML- and YARA-based malicious webpage detection , 2018, Comput. Networks.

[7]  Komminist Weldemariam,et al.  BINSPECT: Holistic Analysis and Detection of Malicious Web Pages , 2012, SecureComm.

[8]  Sebastián Ventura,et al.  Evaluating associative classification algorithms for Big Data , 2019, Big Data Analytics.

[9]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[10]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[11]  David M Levinson,et al.  Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering , 2009, Complex.

[12]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[13]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Weili Han,et al.  Anti-phishing based on automated individual white-list , 2008, DIM '08.

[15]  Jirí Filip,et al.  Classification based on Associations (CBA) - A Performance Analysis , 2018, RuleML+RR.

[16]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[17]  Damien Deville,et al.  SpyProxy: Execution-based Detection of Malicious Web Content , 2007, USENIX Security Symposium.

[18]  Xu Chen,et al.  A stacking model using URL and HTML features for phishing webpage detection , 2019, Future Gener. Comput. Syst..

[19]  Michael Hahsler,et al.  Associative Classification in R: arc, arulesCBA, and rCBA , 2019, R J..

[20]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[21]  Elijah Blessing Rajsingh,et al.  Intelligent phishing url detection using association rule mining , 2016, Human-centric Computing and Information Sciences.