Learning URL Embedding for Malicious Website Detection

The emergence of artificial intelligence technology has promoted the development of the Internet of Things. However, this promising cyber technology can encounter serious security problems while accessing the internet. A malicious website can disguise itself as a normal website, and obtain users’ private information. Thus, it is very important to detect malicious websites using tools such as machine learning (ML) algorithms, as these algorithms can help us to identify abnormal information hidden in the mass traffic more easily. Accordingly, many feature engineering tasks must be performed from memory, as a strong machine learning model is greatly improved with good features. In this article, we propose an unsupervised learning algorithm that learns URL embedding. We also explore some key parameters regarding a domain embedding model to obtain a good effect on domain features.

[1]  Daniel L. Marino,et al.  Modeling and Planning Under Uncertainty Using Deep Neural Networks , 2019, IEEE Transactions on Industrial Informatics.

[2]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Zhenyu Zhong,et al.  Mining DNS for malicious domain registrations , 2010, 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2010).

[5]  Cheng Zhang,et al.  An efficient privacy‐enhanced attribute‐based access control mechanism , 2019, Concurr. Comput. Pract. Exp..

[6]  Mugen Peng,et al.  How Does CSMA/CA Affect the Performance and Security in Wireless Blockchain Networks , 2020, IEEE Transactions on Industrial Informatics.

[7]  Yang Xu,et al.  An adaptive and configurable protection framework against android privilege escalation threats , 2019, Future Gener. Comput. Syst..

[8]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[9]  Witawas Srisa-an,et al.  Significant Permission Identification for Machine-Learning-Based Android Malware Detection , 2018, IEEE Transactions on Industrial Informatics.

[10]  Yang Xu,et al.  A Blockchain-Based Nonrepudiation Network Computing Service Scheme for Industrial IoT , 2019, IEEE Transactions on Industrial Informatics.

[11]  Jinjun Chen,et al.  Detection of Malicious Code Variants Based on Deep Learning , 2018, IEEE Transactions on Industrial Informatics.

[12]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[13]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[14]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[15]  Qun Jin,et al.  Academic Influence Aware and Multidimensional Network Analysis for Research Collaboration Navigation Based on Scholarly Big Data , 2021, IEEE Transactions on Emerging Topics in Computing.

[16]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[17]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[18]  Robert H. Deng,et al.  Efficient and Robust Certificateless Signature for Data Crowdsensing in Cloud-Assisted Industrial IoT , 2019, IEEE Transactions on Industrial Informatics.

[19]  Cheng Zhang,et al.  Blockchain Empowered Arbitrable Data Auditing Scheme for Network Storage as a Service , 2020, IEEE Transactions on Services Computing.

[20]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[22]  Huijuan Lu,et al.  Emerging Privacy Issues and Solutions in Cyber-Enabled Sharing Services: From Multiple Perspectives , 2017, IEEE Access.

[23]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[24]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[25]  Ulrike Meyer,et al.  FANCI : Feature-based Automated NXDomain Classification and Intelligence , 2018, USENIX Security Symposium.

[26]  Jia Wu,et al.  A Correlation-Based Feature Weighting Filter for Naive Bayes , 2019, IEEE Transactions on Knowledge and Data Engineering.

[27]  Balachander Krishnamurthy,et al.  Rule-Based Anomaly Detection on IP Flows , 2009, IEEE INFOCOM 2009.

[28]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[29]  Baojiang Cui,et al.  A Method of Information Protection for Collaborative Deep Learning under GAN Model Attack , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..