Classification of Android apps and malware using deep neural networks

Malware targeting mobile devices is a pervasive problem in modern life. The detection of malware is essentially a software classification problem based on information gathered from program analysis. We focus on classification of Android applications using system API-call sequences and investigate the effectiveness of Deep Neural Networks (DNNs) for such purpose. The ability of DNNs to learn complex and flexible features may lead to timely and effective detection of malware. We design a Convolutional Neural Network (CNN) for sequence classification and conduct a set of experiments on malware detection and categorization of software into functionality groups to test and compare our CNN with classifications by recurrent neural network (LSTM) and other n-gram based methods. Both CNN and LSTM significantly outperformed n-gram based methods. Surprisingly, the performance of our CNN is also much better than that of the LSTM, which is considered a natural choice for sequential data.

[1]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3]  Xuxian Jiang,et al.  DroidChameleon: evaluating Android anti-malware against transformation attacks , 2013, ASIA CCS '13.

[4]  Xuxian Jiang,et al.  Catch Me If You Can: Evaluating Android Anti-Malware Against Transformation Attacks , 2014, IEEE Transactions on Information Forensics and Security.

[5]  Shahid Alam,et al.  DroidNative: Semantic-Based Detection of Android Native Code Malware , 2016, ArXiv.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Sotiris Ioannidis,et al.  Rage against the virtual machine: hindering dynamic analysis of Android malware , 2014, EuroSec '14.

[8]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[9]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  Giorgio Giacinto,et al.  Stealth attacks: An extended insight into the obfuscation effects on Android malware , 2015, Comput. Secur..

[11]  Nicolas Christin,et al.  Evading android runtime analysis via sandbox detection , 2014, AsiaCCS.

[12]  Vijay Laxmi,et al.  AndroSimilar: Robust signature for detecting variants of Android malware , 2015, J. Inf. Secur. Appl..

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[18]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yajin Zhou,et al.  RiskRanker: scalable and accurate zero-day android malware detection , 2012, MobiSys '12.

[23]  Mansour Ahmadi,et al.  DroidScribe: Classifying Android Malware Based on Runtime Behavior , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[24]  Juan E. Tapiador,et al.  Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families , 2014, Expert Syst. Appl..

[25]  Zhenlong Yuan,et al.  DroidDetector: Android Malware Characterization and Detection Using Deep Learning , 2016 .

[26]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[27]  Arun Lakhotia,et al.  DroidLegacy: Automated Familial Classification of Android Malware , 2014, PPREW'14.

[28]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.