Effective and Light-Weight Deobfuscation and Semantic-Aware Attack Detection for PowerShell Scripts

In recent years, PowerShell is increasingly reported to appear in a variety of cyber attacks ranging from advanced persistent threat, ransomware, phishing emails, cryptojacking, financial threats, to fileless attacks. However, since the PowerShell language is dynamic by design and can construct script pieces at different levels, state-of-the-art static analysis based PowerShell attack detection approaches are inherently vulnerable to obfuscations. To overcome this challenge, in this paper we design the first effective and light-weight deobfuscation approach for PowerShell scripts. To address the challenge in precisely identifying the recoverable script pieces, we design a novel subtree-based deobfuscation method that performs obfuscation detection and emulation-based recovery at the level of subtrees in the abstract syntax tree of PowerShell scripts. Building upon the new deobfuscation method, we are able to further design the first semantic-aware PowerShell attack detection system. To enable semantic-based detection, we leverage the classic objective-oriented association mining algorithm and newly identify 31 semantic signatures for PowerShell attacks. We perform an evaluation on a collection of 2342 benign samples and 4141 malicious samples, and find that our deobfuscation method takes less than 0.5 seconds on average and meanwhile increases the similarity between the obfuscated and original scripts from only 0.5% to around 80%, which is thus both effective and light-weight. In addition, with our deobfuscation applied, the attack detection rates for Windows Defender and VirusTotal increase substantially from 0.3% and 2.65% to 75.0% and 90.0%, respectively. Furthermore, when our deobfuscation is applied, our semantic-aware attack detection system outperforms both Windows Defender and VirusTotal with a 92.3% true positive rate and a 0% false positive rate on average.

[1]  Heejo Lee,et al.  Generic unpacking using entropy analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[2]  Jonathon T. Giffin,et al.  Impeding Malware Analysis Using Conditional Code Obfuscation , 2008, NDSS.

[3]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[4]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[5]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[6]  Saumya K. Debray,et al.  Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[7]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[8]  Igor Santos,et al.  Structural Feature Based Anomaly Detection for Packed Executable Identification , 2011, CISIS.

[9]  Igor Santos,et al.  On the adoption of anomaly detection for packed executable filtering , 2014, Comput. Secur..

[10]  Kevin Coogan,et al.  Automatic Static Unpacking of Malware Binaries , 2009, 2009 16th Working Conference on Reverse Engineering.

[11]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[12]  Krzysztof Kryszczuk,et al.  Detecting obfuscated JavaScripts using machine learning , 2016 .

[13]  Zhoujun Li,et al.  Resilient decentralized Android application repackaging detection using logic bombs , 2018, CGO.

[14]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[15]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[16]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[17]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[18]  Kevin Coogan,et al.  Deobfuscation of virtualization-obfuscated software: a semantics-based approach , 2011, CCS '11.

[19]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[20]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[21]  Divya Bansal,et al.  Malware Analysis and Classification: A Survey , 2014 .

[22]  Danny Hendler,et al.  Detecting Malicious PowerShell Commands using Deep Neural Networks , 2018, AsiaCCS.

[23]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[24]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[25]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[26]  Ben Zorn,et al.  "NOFUS: Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code" , 2011 .

[27]  Andreas Dewald,et al.  Forschungsberichte der Fakultät IV – Elektrotechnik und Informatik C UJO : Efficient Detection and Prevention of Drive-by-Download Attacks , 2010 .

[28]  Abdullah Al-Dujaili,et al.  AST-Based Deep Learning for Detecting Malicious PowerShell , 2018, CCS.

[29]  Chao Liu,et al.  PSDEM: A Feasible De-Obfuscation Method for Malicious PowerShell Detection , 2018, 2018 IEEE Symposium on Computers and Communications (ISCC).

[30]  Saumya Debray,et al.  A Generic Approach to Automatic Deobfuscation of Executable Code , 2015, 2015 IEEE Symposium on Security and Privacy.

[31]  P. Vinod,et al.  Information theoretic method for classification of packed and encoded files , 2015, SIN.

[32]  Rehaman Pasha Malwise System for Packed and Polymorphic Malware , 2014 .

[33]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[34]  Ahmed Shosha,et al.  JSDES: An Automated De-Obfuscation System for Malicious JavaScript , 2017, ARES.

[35]  Davide Balzarotti,et al.  SoK: Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers , 2015, 2015 IEEE Symposium on Security and Privacy.

[36]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[37]  Heng Yin,et al.  Renovo: a hidden code extractor for packed executables , 2007, WORM '07.

[38]  Mahdi Abadi,et al.  JSObfusDetector: A binary PSO-based one-class classifier ensemble to detect obfuscated JavaScript code , 2015, 2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP).

[39]  Stephen McCamant,et al.  Binary Code Extraction and Interface Identification for Security Applications , 2009, NDSS.

[40]  Li Sun,et al.  Pattern Recognition Techniques for the Classification of Malware Packers , 2010, ACISP.