Faster Fuzzing: Reinitialization with Deep Neural Models

We improve the performance of the American Fuzzy Lop (AFL) fuzz testing framework by using Generative Adversarial Network (GAN) models to reinitialize the system with novel seed files. We assess performance based on the temporal rate at which we produce novel and unseen code paths. We compare this approach to seed file generation from a random draw of bytes observed in the training seed files. The code path lengths and variations were not sufficiently diverse to fully replace AFL input generation. However, augmenting native AFL with these additional code paths demonstrated improvements over AFL alone. Specifically, experiments showed the GAN was faster and more effective than the LSTM and out-performed a random augmentation strategy, as measured by the number of unique code paths discovered. GAN helps AFL discover 14.23% more code paths than the random strategy in the same amount of CPU time, finds 6.16% more unique code paths, and finds paths that are on average 13.84% longer. Using GAN shows promise as a reinitialization strategy for AFL to help the fuzzer exercise deep paths in software.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[3]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[4]  Teresa Nicole Brooks,et al.  Survey of Automated Vulnerability Detection and Exploit Generation Techniques in Cyber Reasoning Systems , 2017, Advances in Intelligent Systems and Computing.

[5]  Yang Liu,et al.  Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[8]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Rupak Majumdar,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 1997, Lecture Notes in Computer Science.

[11]  John C. Reynolds,et al.  Separation logic: a logic for shared mutable data structures , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[12]  Kathleen Martin,et al.  The Learning Machines. , 1981 .

[13]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[14]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[15]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[16]  John Hughes,et al.  Testing telecoms software with quviq QuickCheck , 2006, ERLANG '06.

[17]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[19]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[20]  Christopher Krügel,et al.  Enemy of the State: A State-Aware Black-Box Web Vulnerability Scanner , 2012, USENIX Security Symposium.

[21]  David Brumley,et al.  Scheduling black-box mutational fuzzing , 2013, CCS.

[22]  Georg Struth,et al.  A Program Construction and Verification Tool for Separation Logic , 2015, MPC.

[23]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[24]  Yang Liu,et al.  Steelix: program-state based binary fuzzing , 2017, ESEC/SIGSOFT FSE.