A Malicious URL Detection Model Based on Convolutional Neural Network

With the development of Internet technology, network security is under diverse threats. In particular, attackers can spread malicious uniform resource locators (URL) to carry out attacks such as phishing and spam. The research on malicious URL detection is significant for defending against these attacks. However, there are still some problems in the current research. For instance, malicious features cannot be extracted efficiently. Some existing detection methods are easy to evade by attackers. We design a malicious URL detection model based on a dynamic convolutional neural network (DCNN) to solve these problems. A new folding layer is added to the original multilayer convolution network. It replaces the pooling layer with the k-max-pooling layer. In the dynamic convolution algorithm, the width of feature mapping in the middle layer depends on the vector input dimension. Moreover, the pooling layer parameters are dynamically adjusted according to the length of the URL input and the depth of the current convolution layer, which is beneficial to extracting more in-depth features in a wider range. In this paper, we propose a new embedding method in which word embedding based on character embedding is leveraged to learn the vector representation of a URL. Meanwhile, we conduct two groups of comparative experiments. First, we conduct three contrast experiments, which adopt the same network structure and different embedding methods. The results prove that word embedding based on character embedding can achieve higher accuracy. We then conduct the other three experiences, which use the same embedding method proposed in this paper and use different network structures to determine which network is most suitable for our model. We verify that the model designed in this paper has the highest accuracy (98%) in detecting malicious URL through these experiences.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Ram B. Basnet,et al.  Towards Detecting and Classifying Malicious URLs Using Deep Learning , 2020, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[3]  Yong Shi,et al.  Malicious Domain Name Detection Based on Extreme Machine Learning , 2017, Neural Processing Letters.

[4]  J. B. Patil,et al.  Survey on Malicious Web Pages Detection Techniques , 2015 .

[5]  Fulvio Valenza,et al.  An Optimized Firewall Anomaly Resolution , 2020, J. Internet Serv. Inf. Secur..

[6]  Konstantin Berlin,et al.  eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys , 2017, ArXiv.

[7]  Tenzin Doleck,et al.  Examining the Relationship between Threat and Coping Appraisal in Phishing Detection among College Students , 2020, J. Internet Serv. Inf. Secur..

[8]  Stefano Zanero,et al.  Phoenix: DGA-Based Botnet Tracking and Intelligence , 2014, DIMVA.

[9]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[10]  Xin Du,et al.  Detection method of domain names generated by DGAs based on semantic representation and deep neural network , 2019, Comput. Secur..

[11]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[12]  Hwankuk Kim,et al.  5G core network security issues and attack classification from network protocol perspective , 2020, J. Internet Serv. Inf. Secur..

[13]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[14]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[15]  Steven C. H. Hoi,et al.  URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection , 2018, ArXiv.

[16]  Heejo Lee,et al.  Detecting Malicious Web Links and Identifying Their Attack Types , 2011, WebApps.

[17]  Hyrum S. Anderson,et al.  Predicting Domain Generation Algorithms with Long Short-Term Memory Networks , 2016, ArXiv.

[18]  Tenzin Doleck,et al.  Towards Detecting and Classifying Network Intrusion Traffic Using Deep Learning Frameworks , 2019, J. Internet Serv. Inf. Secur..

[19]  Jinho Ryu,et al.  SoK: A Systematic Review of Insider Threat Detection , 2019, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[20]  Zhiqiang Wang,et al.  A Malicious URL Detection Model Based on Convolutional Neural Network , 2020, SocialSec.

[21]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[22]  Martine De Cock,et al.  Character Level based Detection of DGA Domain Names , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[23]  Martine De Cock,et al.  Algorithmically Generated Domain Detection and Malware Family Classification , 2018, SSCC.

[24]  Masayuki Murata,et al.  Malicious URL sequence detection using event de-noising convolutional neural network , 2017, 2017 IEEE International Conference on Communications (ICC).

[25]  Soroush Vosoughi,et al.  Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder , 2016, SIGIR.

[26]  Zeng Feng,et al.  Classification for DGA-Based Malicious Domain Names with Deep Learning Architectures , 2017 .

[27]  Qian Ma,et al.  Multi-Classification for Malicious URL Based on Improved Semi-Supervised Algorithm , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[28]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[29]  Ming Zhang,et al.  A Deep Learning Method to Detect Web Attacks Using a Specially Designed CNN , 2017, ICONIP.

[30]  Karel Bartos,et al.  Optimized Invariant Representation of Network Traffic for Detecting Unseen Malware Variants , 2016, USENIX Security Symposium.