Greybox fuzzing as a contextual bandits problem

Greybox fuzzing is one of the most useful and effective techniques for the bug detection in large scale application programs. It uses minimal amount of instrumentation. American Fuzzy Lop (AFL) is a popular coverage based evolutionary greybox fuzzing tool. AFL performs extremely well in fuzz testing large applications and finding critical vulnerabilities, but AFL involves a lot of heuristics while deciding the favored test case(s), skipping test cases during fuzzing, assigning fuzzing iterations to test case(s). In this work, we aim at replacing the heuristics the AFL uses while assigning the fuzzing iterations to a test case during the random fuzzing. We formalize this problem as a `contextual bandit problem' and we propose an algorithm to solve this problem. We have implemented our approach on top of the AFL. We modify the AFL's heuristics with our learned model through the policy gradient method. Our learning algorithm selects the multiplier of the number of fuzzing iterations to be assigned to a test case during random fuzzing, given a fixed length substring of the test case to be fuzzed. We fuzz the substring with this new energy value and continuously updates the policy based upon the interesting test cases it produces on fuzzing.

[1]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2017, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  David Brumley,et al.  Optimizing Seed Selection for Fuzzing , 2014, USENIX Security Symposium.

[3]  David Brumley,et al.  Scheduling black-box mutational fuzzing , 2013, CCS.

[4]  Allen D. Householder,et al.  Probability-Based Parameter Selection for Black-Box Fuzz Testing , 2012 .

[5]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[6]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[7]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[8]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[9]  Kristian Beckers,et al.  An Exploratory Survey of Hybrid Testing Techniques Involving Symbolic Execution and Fuzzing , 2017, ArXiv.

[10]  Rishabh Singh,et al.  Not all bytes are equal: Neural byte sieve for fuzzing , 2017, ArXiv.

[11]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[12]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2017, IEEE Trans. Software Eng..

[13]  Gerald Tesauro,et al.  TD-Gammon: A Self-Teaching Backgammon Program , 1995 .

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[18]  Rishabh Singh,et al.  Deep Reinforcement Fuzzing , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[19]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[20]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).