FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning

Recently, directed grey-box fuzzing (DGF) becomes popular in the field of software testing. Different from coverage-based fuzzing whose goal is to increase code coverage for triggering more bugs, DGF is designed to check whether a piece of potentially buggy code (e.g., string operations) really contains a bug. Ideally, all the inputs generated by DGF should reach the target buggy code until triggering the bug. It is a waste of time when executing with unreachable inputs. Unfortunately, in real situations, large numbers of the generated inputs cannot let a program execute to the target, greatly impacting the efficiency of fuzzing, especially when the buggy code is embedded in the code guarded by various constraints. In this paper, we propose a deep-learning-based approach to predict the reachability of inputs (i.e., miss the target or not) before executing the target program, helping DGF filtering out the unreachable ones to boost the performance of fuzzing. To apply deep learning with DGF, we design a suite of new techniques (e.g., step-forwarding approach, representative data selection) to solve the problems of unbalanced labeled data and insufficient time in the training process. Further, we implement the proposed approach called FuzzGuard and equip it with the state-of-the-art DGF (e.g., AFLGo). Evaluations on 45 real vulnerabilities show that FuzzGuard boosts the fuzzing efficiency of the vanilla AFLGo up to 17.1×. Finally, to understand the key features learned by FuzzGuard, we illustrate their connection with the constraints in the programs.

[1]  Yang Liu,et al.  Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[3]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[4]  Bihuan Chen,et al.  Hawkeye: Towards a Desired Directed Grey-box Fuzzer , 2018, CCS.

[5]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[6]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[7]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Peng Liu,et al.  Dynamically Discovering Likely Memory Layout to Perform Accurate Fuzzing , 2016, IEEE Transactions on Reliability.

[9]  Rishabh Singh,et al.  Not all bytes are equal: Neural byte sieve for fuzzing , 2017, ArXiv.

[10]  Yang Liu,et al.  Steelix: program-state based binary fuzzing , 2017, ESEC/SIGSOFT FSE.

[11]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[12]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Mark Raugas,et al.  Faster Fuzzing: Reinitialization with Deep Neural Models , 2017, ArXiv.

[14]  Xiaogang Wang,et al.  T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[16]  Yi Yang,et al.  A Discriminatively Learned CNN Embedding for Person Reidentification , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[17]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2017, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Edwin Lughofer,et al.  FLEXFIS: A Robust Incremental Learning Approach for Evolving Takagi–Sugeno Fuzzy Models , 2008, IEEE Transactions on Fuzzy Systems.

[19]  Kai Chen,et al.  Black-box testing based on colorful taint analysis , 2011, Science China Information Sciences.

[20]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[21]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[22]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[23]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[24]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[25]  Peiyuan Zong,et al.  SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits , 2017, CCS.

[26]  Junfeng Yang,et al.  NEUZZ: Efficient Fuzzing with Neural Program Learning , 2018, ArXiv.