Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification

Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking. Existing approaches have focused on binary detection i.e., either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This paper proposes a methodology to detect malicious URLs and the type of attacks based on multi-class classification. In this work, we propose 42 new features of spam, phishing and malware URLs. These features are not considered in the earlier studies for malicious URLs detection and attack types identification. Binary and multi-class dataset is constructed using 49935 malicious and benign URLs. It consists of 26041 benign and 23894 malicious URLs containing 11297 malware, 8976 phishing and 3621 spam URLs. To evaluate the proposed approach, the state-of-the-art supervised batch and online machine learning classifiers are used. Experiments are performed on the binary and multi-class dataset using the aforementioned machine learning classifiers. It is found that, confidence weighted learning classifier achieves the best 98.44% average detection accuracy with 1.56% error-rate in the multi-class setting and 99.86% detection accuracy with negligible error-rate of 0.14% in binary setting using our proposed URL features.

[1]  K. S. Kuppusamy,et al.  PhiDMA - A phishing detection model with multi-filter approach , 2017, J. King Saud Univ. Comput. Inf. Sci..

[2]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[3]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[4]  Andreas Dewald,et al.  ADSandbox: sandboxing JavaScript to fight malicious websites , 2010, SAC '10.

[5]  J. B. Patil,et al.  Survey on Malicious Web Pages Detection Techniques , 2015 .

[6]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[7]  Wenke Lee,et al.  ARROW: GenerAting SignatuRes to Detect DRive-By DOWnloads , 2011, WWW.

[8]  Mohammad Pourmahmood Aghababa,et al.  Heuristic nonlinear regression strategy for detecting phishing websites , 2018, Soft Computing.

[9]  Komminist Weldemariam,et al.  BINSPECT: Holistic Analysis and Detection of Malicious Web Pages , 2012, SecureComm.

[10]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[11]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[12]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[13]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify malicious URL's , 2018, J. Intell. Fuzzy Syst..

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Andrew H. Sung,et al.  Detection of Phishing Attacks: A Machine Learning Approach , 2008, Soft Computing Applications in Industry.

[16]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[17]  Anu Vazhayil,et al.  AMA: Static Code Analysis of Web Page for the Detection of Malicious Scripts , 2016 .

[18]  Brent Byunghoon Kang,et al.  Malicious URL protection based on attackers' habitual behavioral analysis , 2018, Comput. Secur..

[19]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[20]  Nauman Aslam,et al.  Detection of online phishing email using dynamic evolving neural network based on reinforcement learning , 2018, Decis. Support Syst..

[21]  Heejo Lee,et al.  Detecting Malicious Web Links and Identifying Their Attack Types , 2011, WebApps.

[22]  Gh. A. Montazer,et al.  Phishing website detection using weighted feature line embedding , 2017, ISC Int. J. Inf. Secur..

[23]  Andrew H. Sung,et al.  Classifying Phishing Emails Using Confidence-Weighted Linear Classifiers , 2010 .

[24]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[25]  Majid Vafaei Jahan,et al.  Analyzing new features of infected web content in detection of malicious web pages , 2017, ISC Int. J. Inf. Secur..

[26]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[27]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[28]  Dharmaraj R. Patil,et al.  Detection of Malicious JavaScript Code in Web Pages , 2017 .

[29]  Hwee Tou Ng,et al.  NUS at the HOO 2012 Shared Task , 2012, BEA@NAACL-HLT.

[30]  Dharmaraj R. Patil,et al.  Malicious Web Pages Detection Using Static Analysis of URLs , 2016 .

[31]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[32]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[33]  Neha Mehra,et al.  Survey on Multiclass Classification Methods , 2013 .

[34]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[35]  Steven C. H. Hoi,et al.  Malicious URL Detection using Machine Learning: A Survey , 2017, ArXiv.

[36]  Rakesh M. Verma,et al.  What's in a URL: Fast Feature Extraction and Malicious URL Detection , 2017, IWSPA@CODASPY.

[37]  Yong Wang,et al.  You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[38]  Ali Selamat,et al.  Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion , 2015 .