Multifamily Classification of Android Malware With a Fuzzy Strategy to Resist Polymorphic Familial Variants

The Multifamily classification of Android malware aims to identify a malicious sample as one of the given malware families. This problem is believed to be much more significant than the binary classification (simply identify a sample as malicious or benign) because it is able to reveal the behaviour patterns of multiple malware families and bring deep insights into the working mechanism of malicious payload. The main challenges of the multifamily classification involve two aspects: recognizing the behaviour patterns of malware families as well as addressing the issues of code obfuscation and polymorphic variants that are commonly used by adversaries to evade rigorous detections. To address these challenges, in this article, we utilize the regular expressions of callbacks to describe the behaviour patterns of malware families, and propose a two-step fuzzy processing strategy to resist potential polymorphic familial variants. The alphabet of such regular expressions only consists of security-sensitive API calls, this enables the regular expressions to resist various kinds of code obfuscation and metamorphism. The proposed fuzzy strategy, applied to the regular expressions, comprises two steps: the first step transforms an original regular expression to such a fuzzy regular expression that possesses a broader meaning than the original one; the second step further relaxes precise plaintext match between two regular expressions to a fuzzy match by introducing the notion of similarity of regular expressions. Applying this strategy promotes the abstract level of a regular expression and enables the behaviour pattern specified by the regular expression to be more resilient to code obfuscation and polymorphic variants. Furthermore, selecting the fuzzy regular expressions as features, we use text mining techniques to train a multifamily 1-NN classifier over 3270 samples of 65 families. The experimental results show that our approach outperforms most of the state-of-the-art approaches and tools, confirming the effectiveness of our approach.

[1]  Wei Wang,et al.  Fingerprinting Android malware families , 2018, Frontiers of Computer Science.

[2]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[3]  Zheng Qin,et al.  A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding , 2019, Comput. Secur..

[4]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[5]  Tao Xie,et al.  AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[6]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[7]  Yajin Zhou,et al.  CodeTracker: A Lightweight Approach to Track and Protect Authorization Codes in SMS Messages , 2018, IEEE Access.

[8]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[9]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[10]  V. S. Subrahmanian,et al.  EC2: Ensemble Clustering and Classification for Predicting Android Malware Families , 2020, IEEE Transactions on Dependable and Secure Computing.

[11]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[12]  Juan E. Tapiador,et al.  Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families , 2014, Expert Syst. Appl..

[13]  Fabio Roli,et al.  Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection , 2017, IEEE Transactions on Dependable and Secure Computing.

[14]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[15]  Kehong Liu,et al.  A Graph-Based Feature Generation Approach in Android Malware Detection with Machine Learning Techniques , 2020 .

[16]  Michael Backes,et al.  ARTist: The Android Runtime Instrumentation and Security Toolkit , 2016, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[17]  Qi Jing,et al.  SEdroid: A Robust Android Malware Detector using Selective Ensemble Learning , 2019, 2020 IEEE Wireless Communications and Networking Conference (WCNC).

[18]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[19]  Dr. Charu C. Aggarwal Machine Learning for Text , 2018, Springer International Publishing.

[20]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[21]  Yu Zhang,et al.  RepassDroid: Automatic Detection of Android Malware Based on Essential Permissions and Semantic Features of Sensitive APIs , 2018, 2018 International Symposium on Theoretical Aspects of Software Engineering (TASE).

[22]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[23]  John C. S. Lui,et al.  TaintART: A Practical Multi-level Information-Flow Tracking System for Android RunTime , 2016, CCS.

[24]  Sakir Sezer,et al.  N-gram Opcode Analysis for Android Malware Detection , 2016, Int. J. Cyber Situational Aware..

[25]  Lamjed Ben Said,et al.  On the use of artificial malicious patterns for android malware detection , 2020, Comput. Secur..

[26]  Yanfang Ye,et al.  SecureDroid: Enhancing Security of Machine Learning-based Detection against Adversarial Android Malware Attacks , 2017, ACSAC.

[27]  Qinghua Zheng,et al.  Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis , 2018, IEEE Transactions on Information Forensics and Security.

[28]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[29]  Sankardas Roy,et al.  Amandroid: A Precise and General Inter-component Data Flow Analysis Framework for Security Vetting of Android Apps , 2014, CCS.

[30]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[31]  Mu Zhang,et al.  Towards Automatic Generation of Security-Centric Descriptions for Android Apps , 2015, CCS.