An Effective Approach for Malware Detection and Explanation via Deep Learning Analysis

The next generation attackers often generate malware variants with Artificial Intelligence (AI) weapons, which are deliberately designed to evade antivirus engines. Security defenders propose many AI-based approaches to detect the massive number of malware variants. However, most AI-based malware detection approaches only output a label to users, and these labels are mainly unexplainable. The lack of transparency has introduced many black-box attacks. Malware developers can develop adversarial examples to evade these AI-based malware detection systems. In this paper, we propose an effective approach for malware detection and explanation, which can locate malicious code snippets by explaining the malware classifier decision result. To this end, firstly, we get the system call number sequence of the target sample with instrumentation tools in an elaborated sandbox. Secondly, we feed the mapped system call number sequence into a deep learning model to make a decision on whether the target sample is benign or malicious. Thirdly, we adopt the Layer-wise Relevance Propagation algorithm to find which slice of a sequence makes the greatest contribution in the decision. Our evaluation demonstrates that our approach achieves high classification accuracy (97.39%), reduces the neural network size by 20 times, and saves the malware analyst time to locate malicious code snippets.