MalNet: A Large-Scale Image Database of Malicious Software

Computer vision is playing an increasingly important role in automated malware detection with the rise of the image-based binary representation. These binary images are fast to generate, require no feature engineering, and are resilient to popular obfuscation methods. Significant research has been conducted in this area, however, it has been restricted to small-scale or private datasets that only a few industry labs and research teams have access to. This lack of availability hinders examination of existing work, development of new research, and dissemination of ideas. We release MalNet-Image, the largest public cybersecurity image database, offering 24x more images and 70x more classes than existing databases (available at https://mal-net.org). MalNet-Image contains over 1.2 million malware images-across 47 types and 696 families---democratizing image-based malware capabilities by enabling researchers and practitioners to evaluate techniques that were previously reported in propriety settings. We report the first million-scale malware detection results on binary images. MalNet-Image unlocks new and unique opportunities to advance the frontiers of machine learning, enabling new research directions into vision-based cyber defenses, multi-class imbalanced classification, and interpretable security.

[1]  Duen Horng Chau,et al.  HAR: Hardness Aware Reweighting for Imbalanced Datasets , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[2]  David Noever,et al.  Virus-MNIST: A Benchmark Malware Dataset , 2021, ArXiv.

[3]  Duen Horng Chau,et al.  A Large-Scale Database for Graph Representation Learning , 2020, NeurIPS Datasets and Benchmarks.

[4]  Javed Ahmed,et al.  Data augmentation based malware detection using convolutional neural networks , 2020, PeerJ Comput. Sci..

[5]  Jimeng Sun,et al.  ELF: An Early-Exiting Framework for Long-Tailed Classification , 2020, ArXiv.

[6]  Bingcai Chen,et al.  End-to-end malware detection for android IoT devices using deep learning , 2020, Ad Hoc Networks.

[7]  Jimeng Sun,et al.  REST: Robust and Efficient Neural Networks for Sleep Monitoring in the Wild , 2020, WWW.

[8]  A. Santone,et al.  Deep learning for image-based mobile malware detection , 2020, Journal of Computer Virology and Hacking Techniques.

[9]  Lei Zhang,et al.  Android Malware Familial Classification Based on DEX File Section Features , 2020, IEEE Access.

[10]  Duen Horng Chau,et al.  D2M: Dynamic Defense and Modeling of Adversarial Movement in Networks , 2020, SDM.

[11]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mahmoud Khasawneh,et al.  MSIC: Malware Spectrogram Image Classification , 2020, IEEE Access.

[13]  L. Chen,et al.  STAMINA: Scalable Deep Learning Approach for Malware Classification , 2020 .

[14]  Hiromu Yakura,et al.  Neural malware analysis with attention mechanism , 2019, Comput. Secur..

[15]  Yan Lu,et al.  Data Augmentation with Generative Models for Improved Malware Detection: A Comparative Study* , 2019, 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).

[16]  Yu Wang,et al.  New Era of Deeplearning-Based Malware Intrusion Detection: The Malware Detection and Prediction Based On Deep Learning , 2019, ArXiv.

[17]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Li Chen,et al.  Deep Transfer Learning for Static Malware Classification , 2018, ArXiv.

[19]  Jinjun Chen,et al.  Detection of Malicious Code Variants Based on Deep Learning , 2018, IEEE Transactions on Industrial Informatics.

[20]  Hiromu Yakura,et al.  Malware Analysis of Imaged Binary Samples by Convolutional Neural Network with Attention Mechanism , 2018, CODASPY.

[21]  Mansour Ahmadi,et al.  Microsoft Malware Classification Challenge , 2018, ArXiv.

[22]  Jingfeng Xue,et al.  Malware Visualization for Fine-Grained Classification , 2018, IEEE Access.

[23]  Kouichi Sakurai,et al.  Lightweight Classification of IoT Malware Based on Image Recognition , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[24]  Yang Wang,et al.  Malware Classification with Deep Convolutional Neural Networks , 2018, 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS).

[25]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[26]  Fabio Ramos,et al.  Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features , 2018 .

[27]  Dan Chia-Tien Lo,et al.  Binary malware image classification using machine learning with local binary pattern , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[28]  Youngsoo Kim,et al.  Malware detection using malware image and deep learning , 2017, 2017 International Conference on Information and Communication Technology Convergence (ICTC).

[29]  Jacques Klein,et al.  AndroZoo++: Collecting Millions of Android Apps and Their Metadata for the Research Community , 2017, ArXiv.

[30]  Songqing Yue,et al.  Imbalanced Malware Images Classification: a CNN based Approach , 2017, ArXiv.

[31]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[32]  Jacques Klein,et al.  Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[33]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[34]  Adam Doupé,et al.  Deep Android Malware Detection , 2017, CODASPY.

[35]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jorge Blasco,et al.  Gamut : Sifting through Images to Detect Android Malware , 2017 .

[38]  Jacques Klein,et al.  AndroZoo: Collecting Millions of Android Apps for the Research Community , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  G. Aghila,et al.  Machine learning based malware classification for Android applications using multimodal image representations , 2016, 2016 10th International Conference on Intelligent Systems and Control (ISCO).

[41]  Hugo Gonzalez,et al.  Enriching reverse engineering through visual exploration of Android binaries , 2015, PPREW@ACSAC.

[42]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Eul Gyu Im,et al.  Malware analysis using visualized images and entropy graphs , 2014, International Journal of Information Security.

[44]  Juan Caballero,et al.  Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting , 2013, DIMVA.

[45]  Srinivas Mukkamala,et al.  Image visualization based malware detection , 2013, 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[46]  Vinod Yegneswaran,et al.  A comparative assessment of malware classification using binary texture analysis and dynamic analysis , 2011, AISec '11.

[47]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[48]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[49]  Sergey Bratus,et al.  A Visual Study of Primitive Binary Fragment Types , 2010 .

[50]  Bezawada Bruhadeshwar,et al.  Signature Generation and Detection of Malware Families , 2008, ACISP.

[51]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .