DeepAM: a heterogeneous deep learning framework for intelligent malware detection

With computers and the Internet being essential in everyday life, malware poses serious and evolving threats to their security, making the detection of malware of utmost concern. Accordingly, there have been many researches on intelligent malware detection by applying data mining and machine learning techniques. Though great results have been achieved with these methods, most of them are built on shallow learning architectures. Due to its superior ability in feature learning through multilayer deep architecture, deep learning is starting to be leveraged in industrial and academic research for different applications. In this paper, based on the Windows application programming interface calls extracted from the portable executable files, we study how a deep learning architecture can be designed for intelligent malware detection. We propose a heterogeneous deep learning framework composed of an AutoEncoder stacked up with multilayer restricted Boltzmann machines and a layer of associative memory to detect newly unknown malware. The proposed deep learning model performs as a greedy layer-wise training operation for unsupervised feature learning, followed by supervised parameter fine-tuning. Different from the existing works which only made use of the files with class labels (either malicious or benign) during the training phase, we utilize both labeled and unlabeled file samples to pre-train multiple layers in the heterogeneous deep learning framework from bottom to up for feature learning. A comprehensive experimental study on a real and large file collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed deep learning framework can further improve the overall performance in malware detection compared with traditional shallow learning methods, deep learning methods with homogeneous framework, and other existing anti-malware scanners. The proposed heterogeneous deep learning framework can also be readily applied to other malware detection tasks.

[1]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[2]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[3]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[4]  Jau-Hwang Wang,et al.  Virus detection using data mining techinques , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[5]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[6]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[7]  Jim Wilson The great bank robbery , 2005 .

[8]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Eric Filiol,et al.  Malware Pattern Scanning Schemes Secure Against Black-box Analysis , 2006, Journal in Computer Virology.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Eric Filiol,et al.  Evaluation methodology and theoretical model for antiviral behavioural detection strategies , 2007, Journal in Computer Virology.

[13]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[14]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[15]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[16]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[17]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[18]  Robert Dunne,et al.  A Statistical Approach to Neural Networks for Pattern Recognition , 2007 .

[19]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[20]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[21]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[22]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[23]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[24]  Qinghua Zhang,et al.  AntiBot: Clustering Common Semantic Patterns for Bot Detection , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[25]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[26]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[27]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[28]  Lilly Suriani Affendey,et al.  Intrusion detection using data mining techniques , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[29]  Yanfang Ye,et al.  Combining file content and file relations for cloud based malware detection , 2011, KDD.

[30]  Bhavani M. Thuraisingham,et al.  Cloud-based malware detection for evolving data streams , 2011, ACM Trans. Manag. Inf. Syst..

[31]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[32]  Asaf Shabtai,et al.  Detecting malware through temporal function-based features. , 2013, CCS 2013.

[33]  Arun Lakhotia,et al.  Countering malware evolution using cloud-based learning , 2013, 2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE).

[34]  Asaf Shabtai,et al.  POSTER: Detecting malware through temporal function-based features , 2013, CCS.

[35]  Kiran Bhowmick,et al.  Virus Detection using Artificial Neural Networks , 2013 .

[36]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[37]  Wanlei Zhou,et al.  Control Flow-Based Malware VariantDetection , 2014, IEEE Transactions on Dependable and Secure Computing.

[38]  Wenhao Huang,et al.  Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning , 2014, IEEE Transactions on Intelligent Transportation Systems.

[39]  Yuancheng Li,et al.  A Hybrid Malicious Code Detection Method based on Deep Learning , 2015 .

[40]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[41]  Yanfang Ye,et al.  Cluster-oriented ensemble classifiers for intelligent malware detection , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).