Malware Detection Based on Code Visualization and Two-Level Classification

Malware creators generate new malicious software samples by making minor changes in previously generated code, in order to reuse malicious code, as well as to go unnoticed from signature-based antivirus software. As a result, various families of variations of the same initial code exist today. Visualization of compiled executables for malware analysis has been proposed several years ago. Visualization can greatly assist malware classification and requires neither disassembly nor code execution. Moreover, new variations of known malware families are instantly detected, in contrast to traditional signature-based antivirus software. This paper addresses the problem of identifying variations of existing malware visualized as images. A new malware detection system based on a two-level Artificial Neural Network (ANN) is proposed. The classification is based on file and image features. The proposed system is tested on the ‘Malimg’ dataset consisting of the visual representation of well-known malware families. From this set some important image features are extracted. Based on these features, the ANN is trained. Then, this ANN is used to detect and classify other samples of the dataset. Malware families creating a confusion are classified by a second level of ANNs. The proposed two-level ANN method excels in simplicity, accuracy, and speed; it is easy to implement and fast to run, thus it can be applied to antivirus software, smart firewalls, web applications, etc.