Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis

In today’s digitalized era, the world wide web services are a vital aspect of each individual’s daily life and are accessible to the users via uniform resource locators (URLs). Cybercriminals constantly adapt to new security technologies and use URLs to exploit vulnerabilities for illicit benefits such as stealing users’ personal and sensitive data, which can lead to financial loss, discredit, ransomware, or the spread of malicious infections and catastrophic cyber-attacks such as phishing attacks. Phishing attacks are being recognized as the leading source of data breaches and the most prevalent deceitful scam of cyber-attacks. Artificial intelligence (AI)-based techniques such as machine learning (ML) and deep learning (DL) have proven to be infallible in detecting phishing attacks. Nevertheless, sequential ML can be time intensive and not highly efficient in real-time detection. It can also be incapable of handling vast amounts of data. However, utilizing parallel computing techniques in ML can help build precise, robust, and effective models for detecting phishing attacks with less computation time. Therefore, in this proposed study, we utilized various multiprocessing and multithreading techniques in Python to train ML and DL models. The dataset used comprised 54 K records for training and 12 K for testing. Five experiments were carried out, the first one based on sequential execution followed by the next four based on parallel execution techniques (threading using Python parallel backend, threading using Python parallel backend and number of jobs, threading manually, and multiprocessing using Python parallel backend). Four models, namely, random forest (RF), naïve bayes (NB), convolutional neural network (CNN), and long short-term memory (LSTM) were deployed to carry out the experiments. Overall, the experiments yielded excellent results and speedup. Lastly, to consolidate, a comprehensive comparative analysis was performed.

[1]  M. Aljabri,et al.  Machine learning-based social media bot detection: a comprehensive literature review , 2023, Social Network Analysis and Mining.

[2]  M. Aljabri,et al.  AI-Based Techniques for Ad Click Fraud Detection and Prevention: Review and Research Directions , 2022, J. Sens. Actuator Networks.

[3]  C. Xenakis,et al.  HELPHED: Hybrid Ensemble Learning PHishing Email Detection , 2022, J. Netw. Comput. Appl..

[4]  Maryam M. Aldossary,et al.  Testing and Exploiting Tools to Improve OWASP Top Ten Security Vulnerabilities Detection , 2022, 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN).

[5]  M. Aljabri,et al.  An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models , 2022, Computational intelligence and neuroscience.

[6]  Sultan H. Almotiri,et al.  Classification of Firewall Log Data Using Multiclass Machine Learning Models , 2022, Electronics.

[7]  Hui Chen,et al.  An effective detection approach for phishing websites using URL and HTML features , 2022, Scientific Reports.

[8]  M. Aljabri,et al.  Phishing Attacks Detection using Machine Learning and Deep Learning Models , 2022, 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA).

[9]  K. Salah,et al.  Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions , 2022, IEEE Access.

[10]  B. Venkatesh,et al.  Detection of Phishing Websites Using Machine Learning , 2022 .

[11]  Mohd. Shafi Pathan,et al.  Movie Popularity and Target Audience Prediction Using the Content-Based Recommender System , 2022, IEEE Access.

[12]  S. Chenthur pandian,et al.  Automatic License Plate Recognition System for Vehicles Using a CNN , 2022, Computers, Materials & Continua.

[13]  Naima Kaabouch,et al.  Phishing Attacks Detection A Machine Learning-Based Approach , 2021, 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).

[14]  Sultan H. Almotiri,et al.  Intelligent Techniques for Detecting Network Attacks: Review and Research Directions , 2021, Sensors.

[15]  A. Dutta Detecting phishing websites using machine learning technique , 2021, PloS one.

[16]  Md. Faisal Khan Detection of Phishing Websites Using Deep Learning Techniques , 2021 .

[17]  Chukmaitova Aliya,et al.  DEEP LEARNING APPROACH FOR PHISHING ATTACKS , 2021 .

[18]  Kendall Lemons,et al.  A Comparison Between Naïve Bayes and Random Forest to Predict Breast Cancer , 2020 .

[19]  Ashutosh Kumar Singh,et al.  Malicious and Benign Webpages Dataset , 2020, Data in brief.

[20]  Nabil Hmina,et al.  Parallel processing using big data and machine learning techniques for intrusion detection , 2020 .

[21]  Jack W. Stokes,et al.  Texception: A Character/Word-Level Deep Learning Model for Phishing URL Detection , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Seifedine Kadry,et al.  Detecting malicious URLs using binary classification through adaboost algorithm , 2020 .

[23]  Sohrab Hossain,et al.  Machine Learning-Based Phishing Attack Detection , 2020 .

[24]  Abdulhamit Subasi,et al.  Comparison of Adaboost with MultiBoosting for Phishing Website Detection , 2020 .

[25]  Seong Oun Hwang,et al.  PhishHaven—An Efficient Real-Time AI Phishing URLs Detection System , 2020, IEEE Access.

[26]  Arun D. Kulkarni,et al.  Phishing Websites Detection using Machine Learning , 2019, International Journal of Recent Technology and Engineering.

[27]  Irfan Siddavatam,et al.  Phishing Website Detection using Machine Learning Algorithms , 2018, International Journal of Computer Applications.

[28]  Abdulhamit Subasi,et al.  Intelligent phishing website detection using random forest classifier , 2017, 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA).

[29]  A. K. Singh,et al.  MalCrawler: A Crawler for Seeking and Crawling Malicious Websites , 2017, ICDCIT.

[30]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .