Deobfuscating Embedded Malware Using Probable-Plaintext Attacks

Malware embedded in documents is regularly used as part of targeted attacks. To hinder a detection by anti-virus scanners, the embedded code is usually obfuscated, often with simple Vigenere ciphers based on XOR, ADD and additional ROL instructions. While for short keys these ciphers can be easily cracked, breaking obfuscations with longer keys requires manually reverse engineering the code or dynamically analyzing the documents in a sandbox. In this paper, we present Kandi, a method capable of efficiently decrypting embedded malware obfuscated using Vigenere ciphers. To this end, our method performs a probable-plaintext attack from classic cryptography using strings likely contained in malware binaries, such as header signatures, library names and code fragments. We demonstrate the efficacy of this approach in different experiments. In a controlled setting, Kandi breaks obfuscations using XOR, ADD and ROL instructions with keys up to 13 bytes in less than a second per file. On a collection of real-world malware in Word, Powerpoint and RTF files, Kandi is able to expose obfuscated malware from every fourth document without involved parsing.

[1]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[2]  Michael Stay ZIP Attacks with Reduced Known Plaintext , 2001, FSE.

[3]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[4]  Angelos Stavrou,et al.  Malicious PDF detection using metadata and structural features , 2012, ACSAC '12.

[5]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[6]  Stefan Berger,et al.  BISSAM: Automatic Vulnerability Identification of Office Documents , 2012, DIMVA.

[7]  Katja Hose,et al.  Partout: a distributed engine for efficient RDF processing , 2012, WWW.

[8]  Gerhard Goos,et al.  Fast Software Encryption , 2001, Lecture Notes in Computer Science.

[9]  Salvatore J. Stolfo,et al.  Towards Stealthy Malware Detection , 2007, Malware Detection.

[10]  Pavel Laskov,et al.  Detection of Intrusions and Malware, and Vulnerability Assessment: 19th International Conference, DIMVA 2022, Cagliari, Italy, June 29 –July 1, 2022, Proceedings , 2022, International Conference on Detection of intrusions and malware, and vulnerability assessment.

[11]  Salvatore J. Stolfo,et al.  A Study of Malcode-Bearing Documents , 2007, DIMVA.

[12]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[13]  Dawn Song,et al.  Malware Detection , 2010, Advances in Information Security.

[14]  Jean-Yves Marion,et al.  Aligot: cryptographic function identification in obfuscated binary programs , 2012, CCS.

[15]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[16]  Levente Buttyán,et al.  Duqu: Analysis, Detection, and Lessons Learned , 2012 .

[17]  Kal Renganathan Sharma,et al.  In-Depth Analysis , 2015 .

[18]  Christopher Krügel,et al.  A Static, Packer-Agnostic Filter to Detect Similar Malware Samples , 2012, DIMVA.

[19]  Pavel Laskov,et al.  Detection of Malicious PDF Files Based on Hierarchical Document Structure , 2013, NDSS.

[20]  Didier Stevens Malicious PDF Documents Explained , 2011, IEEE Security & Privacy.

[21]  Helen Bergen,et al.  File Security in Wordperfect 5.0 , 1991, Cryptologia.

[22]  R. Lewand Cryptological Mathematics , 2000 .

[23]  Muhammad Zubair Shafiq,et al.  Embedded Malware Detection Using Markov n-Grams , 2008, DIMVA.

[24]  Pavel Laskov,et al.  Static detection of malicious JavaScript-bearing PDF documents , 2011, ACSAC '11.

[25]  염흥렬,et al.  [서평]「Applied Cryptography」 , 1997 .