Intelligent Malicious URL Detection with Feature Analysis

The website security is an important issue that must be pursued to protect Internet users. Traditionally, blacklists of malicious websites are maintained, but they do not help in the detection of new malicious websites. This work proposes a machine learning architecture for intelligent detecting malicious URLs. Forty-one features of malicious URLs are extracted from the data processes of domain, Alexa and obfuscation. ANOVA (Analysis of Variance) test and XGBoost (eXtreme Gradient Boosting) algorithm are used to identify the 17 most important features. Finally, dataset is used to learn the XGBoost classifier, which has a detection accuracy of more than 99%.

[1]  Shouq Alfarraj,et al.  Information Security:A Review of Information Security Issues and Techniques , 2019, 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS).

[2]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[3]  Umesh Kulkarni,et al.  An Implemention of a Mechanism for Malicious URLs Detection , 2019, 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom).

[4]  Rajeev Shorey,et al.  Machine Learning & Concept Drift based Approach for Malicious Website Detection , 2020, 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS).

[5]  Jiann-Liang Chen,et al.  AI@ntiPhish - Machine Learning Mechanisms for Cyber-Phishing Attack , 2019, IEICE Trans. Inf. Syst..

[6]  Kouichi Sakurai,et al.  Proactive Blacklisting for Malicious Web Sites by Reputation Evaluation Based on Domain and IP Address Registration , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[7]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[8]  Priyanka C. Nair,et al.  A Machine Learning Approach for Detecting Malicious Websites using URL Features , 2019, 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA).

[9]  Li Zhao,et al.  A Heuristic Approach for Website Classification with Mixed Feature Extractors , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[10]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[11]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[12]  Peng Zhang,et al.  Adaptive Malicious URL Detection: Learning in the Presence of Concept Drifts , 2018, 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).