Network-based Malware Detection with a Two-tier Architecture for Online Incremental Update

As smartphones carry more and more private information, it has become the main target of malware attacks. Threats on mobile devices have become increasingly sophisticated, making it imperative to develop effective tools that are able to detect and counter such threats. Unfortunately, existing malware detection tools based on machine learning techniques struggle to keep up due to the difficulty in performing online incremental update on the detection models. In this paper, a Two-tier Architecture Malware Detection (TAMD) method is proposed, which can learn from the statistical features of network traffic to detect malware. The first layer of TAMD identifies uncertain samples in the training set through a preliminary classification, whereas the second layer builds an improved classifier by filtering out such samples. We enhance TAMD with an incremental leaning based technique (TAMD-IL), which allows to incrementally update the detection models without retraining it from scratch by removing and adding sub-models in TAMD. We experimentally demonstrate that TAMD outperforms the existing methods with up to 98.72% on precision and 96.57% on recall. We also evaluate TAMD-IL on four concept drift datasets and compare it with classical machine learning algorithms, two state-of-the-art malware detection technologies, and three incremental learning technologies. Experimental results show that TAMD-IL is efficient in terms of both update time and memory usage.

[1]  Gerhard Tutz,et al.  Random forest for ordinal responses: Prediction and variable selection , 2016, Comput. Stat. Data Anal..

[2]  Chenxiong Qian,et al.  Toward Engineering a Secure Android Ecosystem , 2016, ACM Comput. Surv..

[3]  Sung-Bae Cho,et al.  Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders , 2018, Inf. Sci..

[4]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.

[5]  Gianluca Stringhini,et al.  BOTection: Bot Detection by Building Markov Chain Models of Bots Network Behavior , 2020, AsiaCCS.

[6]  Yong Wang,et al.  A Semi-Supervised Network Traffic Classification Method Based on Incremental Learning , 2013 .

[7]  Bo Yang,et al.  DART: Detecting Unseen Malware Variants using Adaptation Regularization Transfer Learning , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[8]  Manhee Lee,et al.  Automatic system for measuring security risk of Android application from third party app store , 2016, Secur. Commun. Networks.

[9]  Gianluca Stringhini,et al.  MaMaDroid , 2019, ACM Trans. Priv. Secur..

[10]  Bo Yang,et al.  TrafficAV: An effective and explainable detection of mobile malware behavior using network traffic , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[11]  Vladimir Vovk,et al.  Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection , 2016, AISec@CCS.

[12]  Ali Feizollah,et al.  Evaluation of machine learning classifiers for mobile malware detection , 2014, Soft Computing.

[13]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[14]  Wei Zhang,et al.  Semantics-Based Online Malware Detection: Towards Efficient Real-Time Protection Against Malware , 2016, IEEE Transactions on Information Forensics and Security.

[15]  Anazida Zainal,et al.  Ensemble based categorization and adaptive model for malware detection , 2011, 2011 7th International Conference on Information Assurance and Security (IAS).

[16]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[17]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[18]  Mehdi Dehghan,et al.  Augmentation Scheme for Dealing with Imbalanced Network Traffic Classification Using Deep Learning , 2019, ArXiv.

[19]  Mauro Conti,et al.  Detecting Android Malware Leveraging Text Semantics of Network Flows , 2017, IEEE Transactions on Information Forensics and Security.

[20]  Bo Yang,et al.  DroidCollector: A High Performance Framework for High Quality Android Traffic Collection , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[21]  Yu Zhou,et al.  A Semantics-Aware Approach to the Automated Network Protocol Identification , 2016, IEEE/ACM Transactions on Networking.

[22]  Jun Sun,et al.  Towards Model Checking Android Applications , 2018, IEEE Transactions on Software Engineering.

[23]  Geoffrey I. Webb,et al.  Extremely Fast Decision Tree , 2018, KDD.

[24]  Ivan Martinovic,et al.  MalAlert: Detecting Malware in Large-Scale Network Traffic Using Statistical Features , 2019, PERV.

[25]  Akihiro Nakao,et al.  Toward In-Network Deep Machine Learning for Identifying Mobile Applications and Enabling Application Specific Network Slicing , 2018, IEICE Trans. Commun..

[26]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.