A Novel Framework for Learning to Detect Malicious Web Pages

Malicious web pages are a widely-recognized threat to the security of the web. Malicious web pages launch so-called drive-by download attacks that are able to gain complete control of a user’s computer for illegitimate purpose. Even a single visit to a malicious web page enables an automatically download and installation of malicious malware executables. In this paper, we propose a novel approach for classifying web pages automatically as either malicious or benign based on a supervised machine learning. Our approach learns to detect malicious web pages exclusively based on HTTP session information (e.g., HTTP session headers, domains of requests and responses). With the corpus of 50,000 benign web pages and 500 malicious web pages, we are capable of successfully detecting 92.2% of the malicious web pages with a low false positive rate 0.1%.