Microsoft Malware Classification Challenge

The Microsoft Malware Classification Challenge was announced in 2015 along with a publication of a huge dataset of nearly 0.5 terabytes, consisting of disassembly and bytecode of more than 20K malware samples. Apart from serving in the Kaggle competition, the dataset has become a standard benchmark for research on modeling malware behaviour. To date, the dataset has been cited in more than 50 research papers. Here we provide a high-level comparison of the publications citing the dataset. The comparison simplifies finding potential research directions in this field and future performance evaluation of the dataset.

[1]  Chunming Qiao,et al.  SPABox: Safeguarding Privacy During Deep Packet Inspection at a MiddleBox , 2017, IEEE/ACM Transactions on Networking.

[2]  Edward Raff,et al.  Lempel-Ziv Jaccard Distance, an Effective Alternative to Ssdeep and Sdhash , 2017, Digit. Investig..

[3]  Philip K. Chan,et al.  Learning a Neural-network-based Representation for Open Set Recognition , 2018, SDM.

[4]  Ilia Nouretdinov,et al.  Transcend: Detecting Concept Drift in Malware Classification Models , 2017, USENIX Security Symposium.

[5]  Morten Oscar Østbye Multinomial malware classification based on call graphs , 2017 .

[6]  Bhavani M. Thuraisingham,et al.  Malware Collection and Analysis , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[7]  Rauf Izmailov,et al.  Feature Cultivation in Privileged Information-augmented Detection , 2017, IWSPA@CODASPY.

[8]  YuxinDing,et al.  Malware detection based on deep learning algorithm , 2019 .

[9]  Lingyu Wang,et al.  On the Feasibility of Malware Authorship Attribution , 2016, FPS.

[10]  Yong Qi,et al.  Detecting Malware with an Ensemble Method Based on Deep Neural Network , 2018, Secur. Commun. Networks.

[11]  Yong Wang,et al.  Research on Malicious Code Analysis Method Based on Semi-supervised Learning , 2017 .

[12]  K. P. Soman,et al.  Deep Learning for Network Flow Analysis and Malware Classification , 2017, SSCC.

[13]  Zhang Di,et al.  Projecting "Better Than Randomly": How to Reduce the Dimensionality of Very Large Datasets in a Way That Outperforms Random Projections , 2016 .

[14]  Barry Smyth,et al.  Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space , 2017, ECML/PKDD.

[15]  Ananthram Swami,et al.  Detection under Privileged Information , 2016, AsiaCCS.

[16]  David Clark,et al.  ITect: Scalable Information Theoretic Similarity for Malware Detection , 2016, ArXiv.

[17]  Shengchao Qin,et al.  Effective Malware Detection Based on Behaviour and Data Features , 2017, SmartCom.

[18]  Stephen Kuhn,et al.  Fast Model Learning for the Detection of Malicious Digital Documents , 2017 .

[19]  Mitchell Mays,et al.  Feature Selection for Malware Classification , 2017, MAICS.

[20]  Thomas Barabosch,et al.  Quincy: Detecting Host-Based Code Injection Attacks in Memory Dumps , 2017, DIMVA.

[21]  Jianguo Jiang,et al.  Using Multi-features and Ensemble Learning Method for Imbalanced Malware Classification , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[22]  Yaohang Li,et al.  Malware Sequence Alignment , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[23]  Jianguo Jiang,et al.  Based on Multi-features and Clustering Ensemble Method for Automatic Malware Categorization , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.

[24]  Om Patri,et al.  Discovering Malware with Time Series Shapelets , 2017, HICSS.

[25]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.

[26]  Charles A. Fowler,et al.  A hybrid intelligence/multi-agent system for mining information assurance data , 2015 .

[27]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[28]  Mahmood Yousefi-Azar,et al.  Autoencoder-based feature learning for cyber security applications , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[29]  X. Hu,et al.  Scalable malware classification with multifaceted content features and threat intelligence , 2016, IBM J. Res. Dev..

[30]  Songqing Yue,et al.  Imbalanced Malware Images Classification: a CNN based Approach , 2017, ArXiv.

[31]  Julian Schütte,et al.  WebEye - Automated Collection of Malicious HTTP Traffic , 2018, ArXiv.

[32]  Benny Pinkas,et al.  Adversarial Examples on Discrete Sequences for Beating Whole-Binary Malware Detection , 2018, ArXiv.

[33]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[34]  Rebecca Schuller Borbely,et al.  On normalized compression distance and large malware , 2015, Journal of Computer Virology and Hacking Techniques.

[35]  Yanjun Qi,et al.  Adversarial-Playground: A visualization suite showing how adversarial examples fool deep learning , 2017, 2017 IEEE Symposium on Visualization for Cyber Security (VizSec).

[36]  Felan Carlo C. Garcia,et al.  Random Forest for Malware Classification , 2016, ArXiv.

[37]  Hae-Jung Kim,et al.  Image-Based Malware Classification Using Convolutional Neural Network , 2017, CSA/CUTE.

[38]  Evgeny Burnaev,et al.  One-Class SVM with Privileged Information and Its Application to Malware Detection , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[39]  Philip K. Chan,et al.  Scalable Function Call Graph-based Malware Classification , 2017, CODASPY.

[40]  Tyler Moore,et al.  Polymorphic malware detection using sequence classification methods and ensembles , 2017, EURASIP J. Inf. Secur..

[41]  Joachim Hansen The study of keyword search in open source search engines and digital forensics tools with respect to the needs of cyber crime investigations , 2017 .

[42]  Mohammad Imran EVALUATION OF HIDDEN MARKOV MODEL FOR MALWARE BEHAVIORAL CLASSIFICATION , 2016 .

[43]  Barath Narayanan Narayanan,et al.  Performance analysis of machine learning and pattern recognition algorithms for Malware classification , 2016, 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS).

[44]  Beilun Wang,et al.  A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples , 2016, ICLR 2017.

[45]  Philip K. Chan,et al.  Malware classification using static analysis based features , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[46]  Tyler Moore,et al.  Polymorphic Malware Detection Using Sequence Classification Methods , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[47]  Ananthram Swami,et al.  Building Better Detection with Privileged Information , 2016, ArXiv.

[48]  Kasarapu Ramani Performance Comparison of Machine Learning Algorithms , 2018 .

[49]  Sung-Bae Cho,et al.  Malware Detection Using Deep Transferred Generative Adversarial Networks , 2017, ICONIP.

[50]  Gurinder Shahi Technology in a Changing World , 2009 .

[51]  Beilun Wang,et al.  DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples , 2017, ICLR.