SeqDroid: Obfuscated Android Malware Detection Using Stacked Convolutional and Recurrent Neural Networks

To evade detection, attackers usually obfuscate malicious Android applications. These malicious applications often have randomly generated application IDs or package names, and they are also often signed with randomly created certificates. Conventional machine learning models for detecting such malware are neither robust enough nor scalable to the volume of Android applications that are being produced on a daily basis. Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been applied to identify malware by learning patterns in sequence data. We propose a novel malware classification method for malicious Android applications using stacked RNNs and CNNs so that our model learns the generalized correlation between obfuscated string patterns from an application’s package name and the certificate owner name. The model extracts machine learning features using embedding and gated recurrent units (GRU), and an additional CNN unit further optimizes the feature extraction process. Our experiments demonstrate that our approach outperforms Ngram-based models and that our feature extraction method is robust to obfuscation and sufficiently lightweight for Android devices.

[1]  Mansour Ahmadi,et al.  DroidScribe: Classifying Android Malware Based on Runtime Behavior , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[2]  K. P. Soman,et al.  Deep android malware detection and classification , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[3]  Md. Rafiqul Islam,et al.  Hybrids of support vector machine wrapper and filter based framework for malware detection , 2016, Future Gener. Comput. Syst..

[4]  Mansour Ahmadi,et al.  DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware , 2017, CODASPY.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Xingquan Zhu,et al.  Machine Learning for Android Malware Detection Using Permission and API Calls , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[7]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[8]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[9]  Md. Rafiqul Islam,et al.  Differentiating malware from cleanware using behavioural analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Wei Wang,et al.  Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network , 2018, Journal of Ambient Intelligence and Humanized Computing.

[12]  Mamoun Alazab,et al.  Profiling and classifying the behavior of malicious codes , 2015, J. Syst. Softw..

[13]  Konstantin Berlin,et al.  eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys , 2017, ArXiv.

[14]  Sapna Malik,et al.  System Call Analysis of Android Malware Families , 2016 .

[15]  Muhammad Zubair Shafiq,et al.  Using spatio-temporal information in API calls with machine learning algorithms for malware detection , 2009, AISec '09.

[16]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[17]  Abdelouahid Derhab,et al.  Android Malware Detection using Deep Learning on API Method Sequences , 2017, ArXiv.

[18]  Wei Chen,et al.  More Semantics More Robust: Improving Android Malware Classifiers , 2016, WISEC.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.