Learning the PE Header, Malware Detection with Minimal Domain Knowledge

Many efforts have been made to use various forms of domain knowledge in malware detection. Currently there exist two common approaches to malware detection without domain knowledge, namely byte n-grams and strings. In this work we explore the feasibility of applying neural networks to malware detection and feature learning. We do this by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header. By doing this we show that neural networks can learn from raw bytes without explicit feature construction, and perform even better than a domain knowledge approach that parses the PE header into explicit features.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Ohm Sornil,et al.  Classification of malware families based on N-grams sequential pattern features , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[3]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[4]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[5]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[6]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[9]  Yuval Elovici,et al.  Applying Machine Learning Techniques for Detection of Malicious Code in Network Traffic , 2007, KI.

[10]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[12]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[13]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[14]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[15]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[16]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[17]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[18]  Lior Rokach,et al.  Improving malware detection by applying multi-inducer ensemble , 2009, Comput. Stat. Data Anal..

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[23]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[24]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[25]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[26]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[27]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[28]  Dimva,et al.  Detection of Intrusions and Malware, and Vulnerability Assessment, 5th International Conference, DIMVA 2008, Paris, France, July 10-11, 2008. Proceedings , 2008, DIMVA.

[29]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Edward Raff,et al.  JSAT: Java Statistical Analysis Tool, a Library for Machine Learning , 2017, J. Mach. Learn. Res..

[36]  Bhavani M. Thuraisingham,et al.  A scalable multi-level feature extraction technique to detect malicious executables , 2007, Inf. Syst. Frontiers.

[37]  Eric Filiol,et al.  Structural analysis of binary executable headers for malware detection optimization , 2017, Journal of Computer Virology and Hacking Techniques.

[38]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[39]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[40]  Muhammad Zubair Shafiq,et al.  Embedded Malware Detection Using Markov n-Grams , 2008, DIMVA.

[41]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[42]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bhavani M. Thuraisingham,et al.  Cloud-based malware detection for evolving data streams , 2011, ACM Trans. Manag. Inf. Syst..

[45]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[46]  G. Miller Learning to Forget , 2004, Science.

[47]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[48]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[49]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[50]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[51]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[52]  Yuval Elovici,et al.  Unknown malcode detection and the imbalance problem , 2009, Journal in Computer Virology.

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[54]  Edward Raff,et al.  An investigation of byte n-gram features for malware classification , 2018, Journal of Computer Virology and Hacking Techniques.

[55]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[56]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[57]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[58]  Jianping Yin,et al.  Malicious Codes Detection Based on Ensemble Learning , 2007, ATC.

[59]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[60]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[61]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[62]  Salvatore J. Stolfo,et al.  Towards Stealthy Malware Detection , 2007, Malware Detection.

[63]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.