MtNet: A Multi-Task Neural Network for Dynamic Malware Classification

In this paper, we propose a new multi-task, deep learning architecture for malware classification for the binary i.e. malware versus benign malware classification task. All models are trained with data extracted from dynamic analysis of malicious and benign files. For the first time, we see improvements using multiple layers in a deep neural network architecture for malware classification. The system is trained on 4.5 million files and tested on a holdout test set of 2 million files which is the largest study to date. To achieve a binary classification error rate of 0.358i¾?%, the objective functions for the binary classification task and malware family classification task are combined in the multi-task architecture. In addition, we propose a standard i.e. non multi-task malware family classification architecture which also achieves a malware family classification error rate of 2.94i¾?%.

[1]  Christopher Krügel,et al.  Efficient Detection of Split Personalities in Malware , 2010, NDSS.

[2]  Razvan Benchea,et al.  Combining Restricted Boltzmann Machine and One Side Perceptron for Malware Detection , 2014, ICCS.

[3]  Aditya P. Mathur,et al.  A Survey of Malware Detection Techniques , 2007 .

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[9]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[11]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[12]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[13]  Travis Atkison Applying randomized projection to aid prediction algorithms in detecting high-dimensional rogue applications , 2009, ACM-SE 47.

[14]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[15]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  U. Bayer,et al.  TTAnalyze: A Tool for Analyzing Malware , 2006 .

[17]  Jeffrey O. Kephart,et al.  A biologically inspired immune system for computers , 1994 .

[18]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[21]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[22]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[23]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .