URLCam: Toolkit for malicious URL analysis and modeling

Web technology has become an indispensable part in human’s life for almost all activities. On the other hand, the trend of cyberattacks is on the rise in today’s modern Web-driven world. Therefore, effective countermeasures for the analysis and detection of malicious websites is crucial to combat the rising threats to the cyber world security. In this paper, we systematically reviewed the state-of-the-art techniques and identified a total of about 230 features of malicious websites, which are classified as internal and external features. Moreover, we developed a toolkit for the analysis and modeling of malicious websites. The toolkit has implemented several types of feature extraction methods and machine learning algorithms, which can be used to analyze and compare different approaches to detect malicious URLs. Moreover, the toolkit incorporates several other options such as feature selection and imbalanced learning with flexibility to be extended to include more functionality and generalization capabilities. Moreover, some use cases are demonstrated for different datasets.

[1]  R. Khan,et al.  Email Phishing: An Enhanced Classification Model to Detect Malicious URLs , 2019, EAI Endorsed Trans. Scalable Inf. Syst..

[2]  El-Sayed M. El-Alfy Detection of Phishing Websites Based on Probabilistic Neural Networks and K-Medoids Clustering , 2017, Computer/law journal.

[3]  Erzhou Zhu,et al.  OFS-NN: An Effective Phishing Websites Detection Model Based on Optimal Feature Selection and Neural Network , 2019, IEEE Access.

[4]  Peng Yang,et al.  Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning , 2019, IEEE Access.

[5]  Laxmi Ahuja,et al.  Detecting redirection spam using multilayer perceptron neural network , 2017, Soft Computing.

[6]  Jiann-Liang Chen,et al.  AI@ntiPhish - Machine Learning Mechanisms for Cyber-Phishing Attack , 2019, IEICE Trans. Inf. Syst..

[7]  Tansel Dökeroglu,et al.  Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection , 2018, Soft Computing.

[8]  J. Patil,et al.  Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification , 2018, ISC Int. J. Inf. Secur..

[9]  Choon Lin Tan,et al.  A new hybrid ensemble feature selection framework for machine learning-based phishing detection system , 2019, Inf. Sci..

[10]  Long Yu,et al.  Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network , 2019, Applied Intelligence.

[11]  Mahdi Abadi,et al.  SocialBotHunter: Botnet Detection in Twitter-Like Social Networking Services Using Semi-Supervised Collective Classification , 2018, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[12]  Waleed Ali,et al.  Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection , 2017 .

[13]  Antonio Piccolo,et al.  Malicious URL detection via spherical classification , 2017, Neural Computing and Applications.

[14]  Indrakshi Ray,et al.  Improving Auto-Detection of Phishing Websites using Fresh-Phish Framework , 2018, Int. J. Multim. Data Eng. Manag..

[15]  James J. Park,et al.  Advances in Computer Science and its Applications , 2020 .

[16]  Wei Wang,et al.  Web Phishing Detection Using a Deep Learning Framework , 2018, Wirel. Commun. Mob. Comput..

[17]  Victor R. L. Shen,et al.  Javascript Malware Detection Using A High-Level Fuzzy Petri Net , 2018, 2018 International Conference on Machine Learning and Cybernetics (ICMLC).

[18]  Dohoon Kim,et al.  WebMon: ML- and YARA-based malicious webpage detection , 2018, Comput. Networks.

[19]  Gang Xiong,et al.  A Novel Website Fingerprinting Method for Malicious Websites Detection , 2019 .

[20]  Baojiang Cui,et al.  Detecting Malicious URLs via a Keyword-Based Convolutional Gated-Recurrent-Unit Neural Network , 2019, IEEE Access.

[21]  Iwao Sasase,et al.  Obfuscated malicious javascript detection scheme using the feature based on divided URL , 2017, 2017 23rd Asia-Pacific Conference on Communications (APCC).

[22]  Rong Wang,et al.  Detection of malicious web pages based on hybrid analysis , 2017, J. Inf. Secur. Appl..

[23]  Abdelfettah Belghith,et al.  CBR-PDS: a case-based reasoning phishing detection system , 2019, J. Ambient Intell. Humaniz. Comput..

[24]  Banu Diri,et al.  Machine learning based phishing detection from URLs , 2019, Expert Syst. Appl..

[25]  Ankit Kumar Jain,et al.  A machine learning based approach for phishing detection using hyperlinks information , 2018, Journal of Ambient Intelligence and Humanized Computing.

[26]  Keqin Li,et al.  A keyword-based combination approach for detecting phishing webpages , 2019, Comput. Secur..

[27]  Jianhua Liu,et al.  A Markov Detection Tree-Based Centralized Scheme to Automatically Identify Malicious Webpages on Cloud Platforms , 2018, IEEE Access.

[28]  Ankit Kumar Jain,et al.  Towards detection of phishing websites on client-side using machine learning based approach , 2017, Telecommunication Systems.