RENN: Efficient Reverse Execution with Neural-Network-Assisted Alias Analysis

Reverse execution and coredump analysis have long been used to diagnose the root cause of software crashes. Each of these techniques, however, face inherent challenges, such as insufficient capability when handling memory aliases. Recent works have used hypothesis testing to address this drawback, albeit with high computational complexity, making them impractical for real world applications. To address this issue, we propose a new deep neural architecture, which could significantly improve memory alias resolution. At the high level, our approach employs a recurrent neural network (RNN) to learn the binary code pattern pertaining to memory accesses. It then infers the memory region accessed by memory references. Since memory references to different regions naturally indicate a non-alias relationship, our neural architecture can greatly reduce the burden of doing hypothesis testing to track down non-alias relation in binary code. Different from previous researches that have utilized deep learning for other binary analysis tasks, the neural network proposed in this work is fundamentally novel. Instead of simply using off-the-shelf neural networks, we designed a new recurrent neural architecture that could capture the data dependency between machine code segments. To demonstrate the utility of our deep neural architecture, we implement it as RENN, a neural network-assisted reverse execution system. We utilize this tool to analyze software crashes corresponding to 40 memory corruption vulnerabilities from the real world. Our experiments show that RENN can significantly improve the efficiency of locating the root cause for the crashes. Compared to a state-of-the-art technique, RENN has 36.25% faster execution time on average, detects an average of 21.35% more non-alias pairs, and successfully identified the root cause of 12.5% more cases.

[1]  Xiangyu Zhang,et al.  Analyzing multicore dumps to facilitate concurrency bug reproduction , 2010, ASPLOS XV.

[2]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[3]  Tankut Akgul Assembly instruction level reverse execution for debugging , 2004, TSEM.

[4]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[5]  Peng Liu,et al.  Postmortem Program Analysis with Hardware-Enhanced Post-Crash Artifacts , 2017, USENIX Security Symposium.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[8]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[9]  Ben Niu,et al.  REPT: Reverse Debugging of Failures in Deployed Software , 2018, OSDI.

[10]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[11]  Peng Liu,et al.  CREDAL: Towards Locating a Memory Corruption Vulnerability with Your Core Dump , 2016, CCS.

[12]  Ali-Reza Adl-Tabatabai,et al.  CoreRacer: A practical memory race recorder for multicore x86 TSO processors , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Le Song,et al.  Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection , 2018 .

[15]  Jeff Huang,et al.  CLAP: recording local executions to reproduce concurrency failures , 2013, PLDI.

[16]  Alessandro Orso,et al.  BugRedux: Reproducing field failures for in-house debugging , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[17]  Nikhil R. Pal,et al.  On minimum cross-entropy thresholding , 1996, Pattern Recognit..

[18]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[19]  Eric Schulte,et al.  Using recurrent neural networks for decompilation , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Yanick Fratantonio,et al.  RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[24]  Richard W. Vuduc,et al.  A New Method for Program Inversion , 2012, CC.

[25]  Gang Wang,et al.  Understanding the Reproducibility of Crowd-reported Security Vulnerabilities , 2018, USENIX Security Symposium.

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[28]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[29]  Zhenkai Liang,et al.  Neural Nets Can Learn Function Type Signatures From Binaries , 2017, USENIX Security Symposium.

[30]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[33]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[34]  Santosh Pande,et al.  A fast assembly level reverse execution method via dynamic slicing , 2004, Proceedings. 26th International Conference on Software Engineering.

[35]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[36]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[37]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[38]  George Candea,et al.  Automated Debugging for Arbitrarily Long Executions , 2013, HotOS.

[39]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[40]  Michael D. Ernst,et al.  ReCrash: Making Software Failures Reproducible by Preserving Object States , 2008, ECOOP.

[41]  Barton P. Miller,et al.  Learning to Analyze Binary Computer Code , 2008, AAAI.