DeepReflect: Discovering Malicious Functionality through Binary Reconstruction

Deep learning has continued to show promising results for malware classification. However, to identify key malicious behaviors, malware analysts are still tasked with reverse engineering unknown malware binaries using static analysis tools, which can take hours. Although machine learning can be used to help identify important parts of a binary, supervised approaches are impractical due to the expense of acquiring a sufficiently large labeled dataset. To increase the productivity of static (or manual) reverse engineering, we propose DEEPREFLECT: a tool for localizing and identifying malware components within a malicious binary. To localize malware components, we use an unsupervised deep neural network in a novel way, and classify the components through a semi-supervised cluster analysis, where analysts incrementally provide labels during their daily work flow. The tool is practical since it requires no data labeling to train the localization model, and minimal/noninvasive labeling to train the classifier incrementally. In our evaluation with five malware analysts on over 26k malware samples, we found that DEEPREFLECT reduces the number of functions that an analyst needs to reverse engineer by 85% on average. Our approach also detects 80% of the malware components compared to 43% when using a signature-based tool (CAPA). Furthermore, DEEPREFLECT performs better with our proposed autoencoder than SHAP (an AI explanation tool). This is significant because SHAP, a state-of-the-art method, requires a labeled dataset and autoencoders do not.

[1]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Worm Detection and Signature Generation , 2005, RAID.

[4]  Sung-Bae Cho,et al.  Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders , 2018, Inf. Sci..

[5]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[6]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[7]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[8]  Christopher Krügel,et al.  Detecting System Emulators , 2007, ISC.

[9]  Pascal Junod,et al.  Obfuscator-LLVM -- Software Protection for the Masses , 2015, 2015 IEEE/ACM 1st International Workshop on Software Protection.

[10]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[11]  Gianluca Stringhini,et al.  MaMaDroid , 2019, ACM Trans. Priv. Secur..

[12]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[13]  K. P. Soman,et al.  Deep android malware detection and classification , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[14]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[15]  Heng Yin,et al.  Scalable Graph-based Bug Search for Firmware Images , 2016, CCS.

[16]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[17]  Benjamin C. M. Fung,et al.  BinClone: Detecting Code Clones in Malware , 2014, 2014 Eighth International Conference on Software Security and Reliability.

[18]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[19]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[20]  Herbert Bos,et al.  Compiler-Agnostic Function Detection in Binaries , 2017, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[21]  Nathan S. Netanyahu,et al.  DeepSign: Deep learning for automatic malware signature generation and classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[22]  Hyrum S. Anderson,et al.  EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models , 2018, ArXiv.

[23]  Mansour Ahmadi,et al.  Microsoft Malware Classification Challenge , 2018, ArXiv.

[24]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[25]  Ramon G. Garcia,et al.  Classification of Malware programs using autoencoders based deep learning architecture and its application to the microsoft malware Classification challenge (BIG 2015) dataset , 2017, 2017 IEEE National Aerospace and Electronics Conference (NAECON).

[26]  Davide Balzarotti,et al.  SoK: Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers , 2015, 2015 IEEE Symposium on Security and Privacy.

[27]  Yevgeniy Vorobeychik,et al.  Large-Scale Identification of Malicious Singleton Files , 2017, CODASPY.

[28]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[29]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[30]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[31]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[32]  Michalis Polychronakis,et al.  Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[33]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[34]  Jeffrey S. Foster,et al.  An Observational Investigation of Reverse Engineers' Processes , 2019, USENIX Security Symposium.

[35]  Vitaly Shmatikov,et al.  Abusing File Processing in Malware Detectors for Fun and Profit , 2012, 2012 IEEE Symposium on Security and Privacy.

[36]  Yanfang Ye,et al.  DL 4 MD : A Deep Learning Framework for Intelligent Malware Detection , 2016 .

[37]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[38]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[39]  B. S. Manjunath,et al.  SigMal: a static signal processing based malware triage , 2013, ACSAC.

[40]  Jeffrey S. Foster,et al.  An Observational Investigation of Reverse Engineers' Process and Mental Models , 2019, CHI Extended Abstracts.

[41]  Fabio Roli,et al.  Poisoning behavioral malware clustering , 2014, AISec '14.

[42]  Wenke Lee,et al.  Misleading worm signature generators using deliberate noise injection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[43]  Mahmood Yousefi-Azar,et al.  Autoencoder-based feature learning for cyber security applications , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[44]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[45]  Le Song,et al.  Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection , 2018 .

[46]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Christopher Krügel,et al.  Efficient Detection of Split Personalities in Malware , 2010, NDSS.

[48]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[49]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[50]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[51]  Giovanni Vigna,et al.  Neurlux: dynamic malware analysis without feature engineering , 2019, ACSAC.

[52]  Yuval Elovici,et al.  Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection , 2018, NDSS.

[53]  Igor Santos,et al.  Semi-supervised Learning for Unknown Malware Detection , 2011, DCAI.

[54]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[55]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[56]  Wenke Lee,et al.  Polymorphic Blending Attacks , 2006, USENIX Security Symposium.

[57]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).