Fine-Grained Mining and Classification of Malicious Web Pages

With the World Wide Web expanding continuously, more and more malicious web pages including phishing, malware and spamming spread rapidly, we are facing a great threat. The work of detecting malicious web pages and identifying their threat types has some shortcomings. In existing studies of malicious webpage detection, most are just for detecting a single attack type. In this paper, we extract a variety of webpage features and use machine learning algorithms to build an efficient classifier. By the classifier, we can detect malicious web pages and identify all the popular threat types. The features extracted in our method are derived from the HTML contents, the associated JavaScript code, and the corresponding URL. We collected 1000 benign web pages and 1500 malicious web pages as experimental data sets. The experimental results show that our method achieves a superior performance: the accuracy was over 95% in detecting malicious web pages and over 88% in identifying threat types.