Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability

This paper proposes a technique for automatically learning semantic malware signatures for Android from very few samples of a malware family. The key idea underlying our technique is to look for a maximally suspicious common subgraph (MSCS) that is shared between all known instances of a malware family. An MSCS describes the shared functionality between multiple Android applications in terms of inter-component call relations and their semantic metadata (e.g., data-flow properties). Our approach identifies such maximally suspicious common subgraphs by reducing the problem to maximum satisfiability. Once a semantic signature is learned, our approach uses a combination of static analysis and a new approximate signature matching algorithm to determine whether an Android application matches the semantic signature characterizing a given malware family. We have implemented our approach in a tool called ASTROID and show that it has a number of advantages over state-of-the-art malware detection techniques. First, we compare the semantic malware signatures automatically synthesized by ASTROID with manually-written signatures used in previous work and show that the signatures learned by ASTROID perform better in terms of accuracy as well as precision. Second, we compare ASTROID against two state-of-the-art malware detection tools and demonstrate its advantages in terms of interpretability and accuracy. Finally, we demonstrate that ASTROID's approximate signature matching algorithm is resistant to behavioral obfuscation and that it can be used to detect zero-day malware. In particular, we were able to find 22 instances of zero-day malware in Google Play that are not reported as malware by existing tools.

[1]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[2]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[3]  Jacques Klein,et al.  Effective Inter-Component Communication Mapping in Android: An Essential Step Towards Holistic Security Analysis , 2013, USENIX Security Symposium.

[4]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[5]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[6]  Tao Xie,et al.  AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Vasco M. Manquinho,et al.  Open-WBO: A Modular MaxSAT Solver, , 2014, SAT.

[8]  Jacques Klein,et al.  Effective inter-component communication mapping in Android with Epicc: an essential step towards holistic security analysis , 2013 .

[9]  Isil Dillig,et al.  EXPLORER : query- and demand-driven exploration of interprocedural control flow properties , 2015, OOPSLA.

[10]  Heng Yin,et al.  DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis , 2012, USENIX Security Symposium.

[11]  Wenke Lee,et al.  CHEX: statically vetting Android apps for component hijacking vulnerabilities , 2012, CCS.

[12]  Jeff H. Perkins,et al.  Information Flow Analysis of Android Applications in DroidSafe , 2015, NDSS.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[15]  Yajin Zhou,et al.  RiskRanker: scalable and accurate zero-day android malware detection , 2012, MobiSys '12.

[16]  Stephen McCamant,et al.  HI-CFG: Construction by Binary Analysis and Application to Attack Polymorphism , 2013, ESORICS.

[17]  Yannis Smaragdakis,et al.  Hybrid context-sensitivity for points-to analysis , 2013, PLDI.

[18]  Yajin Zhou,et al.  Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets , 2012, NDSS.

[19]  Peng Wang,et al.  Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale , 2015, USENIX Security Symposium.

[20]  Kang G. Shin,et al.  Behavioral detection of malware on mobile handsets , 2008, MobiSys '08.

[21]  Patrick D. McDaniel,et al.  On lightweight mobile phone application certification , 2009, CCS.

[22]  Chao Yang,et al.  DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications , 2014, ESORICS.

[23]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[24]  Tzi-cker Chiueh,et al.  Automatic Generation of String Signatures for Malware Detection , 2009, RAID.

[25]  Yuan Zhang,et al.  AppIntent: analyzing sensitive data transmission in android for privacy leakage detection , 2013, CCS.

[26]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[27]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[28]  Xue Liu,et al.  Effective Real-Time Android Application Auditing , 2015, 2015 IEEE Symposium on Security and Privacy.

[29]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[30]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[31]  Felip Manyà,et al.  MaxSAT, Hard and Soft Constraints , 2021, Handbook of Satisfiability.

[32]  Fernando Pereira,et al.  Yedalog: Exploring Knowledge at Scale , 2015, SNAPL.

[33]  Josep Argelich,et al.  Boolean lexicographic optimization: algorithms & applications , 2011, Annals of Mathematics and Artificial Intelligence.

[34]  Ninghui Li,et al.  Using probabilistic generative models for ranking risks of Android apps , 2012, CCS.