Malicious URL Detection Based on Improved Multilayer Recurrent Convolutional Neural Network Model

The traditional malicious uniform resource locator (URL) detection method excessively relies on the matching rules formulated by the network security personnel, which is hard to fully express the text information of the URL. Thus, an improved multilayer recurrent convolutional neural network model based on the YOLO algorithm is proposed to detect malicious URL in this paper. First, single characters are mapped to dense vectors using word embedding, and the dense vectors are participated in the training process of the whole model according to the structural characteristics of the URL in the method. Then, the CSPDarknet neural network model based on the improved YOLO algorithm is proposed to extract features of the URL. Finally, the extracted features are used to evaluate malicious URL by the bidirectional LSTM recurrent neural network algorithm. In order to verify the validity of the algorithm, a total of 200,000 URLs are collected, including 100,000 normal URLs labeled “good” and 100,000 malicious URLs labeled “bad”. The experimental results show that the method detects malicious URLs more quickly and effectively and has high accuracy, high recall rate, and high accuracy compared with Text-RCNN, BRNN, and other models.

[1]  Shuigeng Zhou,et al.  A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Deepak S. Turaga,et al.  Firstfilter: A cost-sensitive approach to malicious URL detection in large-scale enterprise networks , 2016, IBM J. Res. Dev..

[3]  Zuguo Chen,et al.  Information synergy entropy based multi-feature information fusion for the operating condition identification in aluminium electrolysis , 2021, Inf. Sci..

[4]  Mubarak Shah,et al.  Training Faster by Separating Modes of Variation in Batch-Normalized Models , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tie Li,et al.  Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods , 2020, Inf. Syst..

[6]  Maximo Cobos,et al.  A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification , 2020, IEEE Access.

[7]  Tianshuang Qiu,et al.  Unauthorized Broadcasting Identification: A Deep LSTM Recurrent Learning Approach , 2020, IEEE Transactions on Instrumentation and Measurement.

[8]  Fan Yang,et al.  Context Embedding Based on Bi-LSTM in Semi-Supervised Biomedical Word Sense Disambiguation , 2019, IEEE Access.

[9]  Baojiang Cui,et al.  Learning URL Embedding for Malicious Website Detection , 2020, IEEE Transactions on Industrial Informatics.

[10]  Leonel Sousa,et al.  Efficient Modular Adder Designs Based on Thermometer and One-Hot Coding , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Long Yu,et al.  Bidirectional LSTM Malicious webpages detection algorithm based on convolutional neural network and independent recurrent neural network , 2019, Applied Intelligence.

[12]  Baojiang Cui,et al.  Detecting Malicious URLs via a Keyword-Based Convolutional Gated-Recurrent-Unit Neural Network , 2019, IEEE Access.

[13]  Tao Dai,et al.  Attentive Stacked Denoising Autoencoder With Bi-LSTM for Personalized Context-Aware Citation Recommendation , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Shengyao Wang,et al.  Signal Detection in Uplink Time-Varying OFDM Systems Using RNN With Bidirectional LSTM , 2020, IEEE Wireless Communications Letters.

[15]  Fakhri Alam Khan,et al.  Countering Malicious URLs in Internet of Things Using a Knowledge-Based Approach and a Simulated Expert , 2020, IEEE Internet of Things Journal.

[16]  Peng Yang,et al.  Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning , 2019, IEEE Access.

[17]  Faliang Huang,et al.  Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images , 2019, IEEE Access.

[18]  Muhammad Haroon Yousaf,et al.  Melanoma Lesion Detection and Segmentation Using YOLOv4-DarkNet and Active Contour , 2020, IEEE Access.

[19]  Shengwei Tian,et al.  Malicious URL Detection Based on a Parallel Neural Joint Model , 2021, IEEE Access.

[20]  Seong Oun Hwang,et al.  PhishHaven—An Efficient Real-Time AI Phishing URLs Detection System , 2020, IEEE Access.