Controlling malware HTTP communications in dynamic analysis system using search engine

Malware is one of the most serious threats on the Internet. Countermeasures have been developed, but still many users are infected. Detecting and preventing communication by infected users from the network side would effectively mitigate the threats of malware. For this, we need to collect information about the destinations or payloads of malware communication. Dynamic analysis is usually used to obtain this information. Since some malware requires access to the Internet, e.g., bots and downloaders, the dynamic analysis environment must connect to the Internet. Recently developed malware communicates with remote hosts by HTTP protocol for not only command-and-control (C&C) or malware downloading but also attacks. For secure dynamic analysis in an environment with Internet connectivity, it is necessary to determine if the destination is for C&C or malware downloading and to only allow connection to these servers. We propose a dynamic analysis system with Internet connection that controls HTTP communication by using a search engine. To control HTTP connections, we built a classifier using a support vector machine based on the assumption that sites for C&C or malware downloading, for example, are harder to find and have a lower backlink count than benign sites. Our classifier, which is trained on popular URLs and URLs based on malware analysis, has 99.69% cross-validation accuracy. We evaluated other known benign popular sites with our classifier, and they were all classified as benign. Our evaluation confirms that our classifier can distinguish benign sites, so the proposed dynamic analysis system is effective for safe analysis in an environment with Internet connection.