Neutron: an attention-based neural decompiler

Decompilation aims to analyze and transform low-level program language (PL) codes such as binary code or assembly code to obtain an equivalent high-level PL. Decompilation plays a vital role in the cyberspace security fields such as software vulnerability discovery and analysis, malicious code detection and analysis, and software engineering fields such as source code analysis, optimization, and cross-language cross-operating system migration. Unfortunately, the existing decompilers mainly rely on experts to write rules, which leads to bottlenecks such as low scalability, development difficulties, and long cycles. The generated high-level PL codes often violate the code writing specifications. Further, their readability is still relatively low. The problems mentioned above hinder the efficiency of advanced applications (e.g., vulnerability discovery) based on decompiled high-level PL codes. In this paper, we propose a decompilation approach based on the attention-based neural machine translation (NMT) mechanism, which converts low-level PL into high-level PL while acquiring legibility and keeping functionally similar. To compensate for the information asymmetry between the low-level and high-level PL, a translation method based on basic operations of low-level PL is designed. This method improves the generalization of the NMT model and captures the translation rules between PLs more accurately and efficiently. Besides, we implement a neural decompilation framework called Neutron. The evaluation of two practical applications shows that Neutron’s average program accuracy is 96.96%, which is better than the traditional NMT model.

[1]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[2]  Andrew D. Gordon,et al.  Bimodal Modelling of Source Code and Natural Language , 2015, ICML.

[3]  Alexander Meduna,et al.  Design of an automatically generated retargetable decompiler , 2011 .

[4]  David Brumley,et al.  Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring , 2013, USENIX Security Symposium.

[5]  Khaled Yakdan,et al.  No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations , 2015, NDSS.

[6]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[7]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[8]  Regina Barzilay,et al.  Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2017, ACL 2017.

[9]  Kai Chen,et al.  FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning , 2020, USENIX Security Symposium.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[12]  Eran Yahav,et al.  Towards Neural Decompilation , 2019, ArXiv.

[13]  Stochastic superoptimization , 2013, ASPLOS.

[14]  Petr Zemek,et al.  PsybOt malware: A step-by-step decompilation case study , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[15]  Saumya Debray,et al.  A Generic Approach to Automatic Deobfuscation of Executable Code , 2015, 2015 IEEE Symposium on Security and Privacy.

[16]  Yutaka Matsuo,et al.  A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes , 2017, ACL.

[17]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[18]  Chao Zhang,et al.  Revery: From Proof-of-Concept to Exploitable , 2018, CCS.

[19]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[22]  Fei Peng,et al.  X-Force: Force-Executing Binary Programs for Security Applications , 2014, USENIX Security Symposium.

[23]  Hakjoo Oh,et al.  Machine-Learning-Guided Selectively Unsound Static Analysis , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[24]  Lior Wolf,et al.  Learning to Align the Source Code to the Compiled Object Code , 2017, ICML.

[25]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[26]  Eric Schulte,et al.  Using recurrent neural networks for decompilation , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[27]  Shuai Wang,et al.  How far we have come: testing decompilation correctness of C decompilers , 2020, ISSTA.

[28]  Khaled Yakdan,et al.  Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[29]  Naren Ramakrishnan,et al.  Neural Abstractive Text Summarization with Sequence-to-Sequence Models , 2018, Trans. Data Sci..

[30]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Peiyuan Zong,et al.  SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits , 2017, CCS.

[32]  Yuandong Tian,et al.  Coda: An End-to-End Neural Program Decompiler , 2019, NeurIPS.