Learn&Fuzz: Machine learning for input fuzzing

Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss and measure the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.

[1]  K. V. Hanford,et al.  Automatic Generation of Test Cases , 1970, IBM Syst. J..

[2]  Paul Walton Purdom,et al.  A sentence generator for testing parsers , 1972 .

[3]  Peter M. Maurer,et al.  Generating test data with enhanced context-free grammars , 1990, IEEE Software.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Emin Gün Sirer,et al.  Using production grammars in software testing , 1999, DSL '99.

[6]  K. Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP '00.

[7]  David Coppit,et al.  yagg: an easy-to-use generator for structured test inputs , 2005, ASE.

[8]  Bruno Legeard,et al.  A taxonomy of model-based testing , 2006 .

[9]  Ralf Lämmel,et al.  Controllable Combinatorial Coverage in Grammar-Based Testing , 2006, TestCom.

[10]  Rupak Majumdar,et al.  Directed test generation using symbolic grammars , 2007, ESEC-FSE companion '07.

[11]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[12]  Darko Marinov,et al.  Automated testing of refactoring engines , 2007, ESEC-FSE '07.

[13]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[14]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[15]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[16]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[17]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[22]  Michael Pradel,et al.  Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data , 2016 .

[23]  Andreas Zeller,et al.  Mining input grammars from dynamic taints , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Pushmeet Kohli,et al.  Adaptive Neural Compilation , 2016, NIPS.

[25]  Armando Solar-Lezama,et al.  sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[26]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[27]  Rishabh Singh,et al.  Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks , 2016, ArXiv.

[28]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[29]  Alexander Aiken,et al.  Synthesizing program input grammars , 2016, PLDI.

[30]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[31]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[32]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[33]  Pushmeet Kohli,et al.  AP: Artificial Programming , 2017, SNAPL.