Optimizing Seed Selection for Fuzzing

Randomly mutating well-formed program inputs or simply fuzzing, is a highly effective and widely used strategy to find bugs in software. Other than showing fuzzers find bugs, there has been little systematic effort in understanding the science of how to fuzz properly. In this paper, we focus on how to mathematically formulate and reason about one critical aspect in fuzzing: how best to pick seed files to maximize the total number of bugs found during a fuzz campaign. We design and evaluate six different algorithms using over 650 CPU days on Amazon Elastic Compute Cloud (EC2) to provide ground truth data. Overall, we find 240 bugs in 8 applications and show that the choice of algorithm can greatly increase the number of bugs found. We also show that current seed selection strategies as found in Peach may fare no better than picking seeds at random. We make our data set and code publicly available.

[1]  David A. Wagner,et al.  Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs , 2009, USENIX Security Symposium.

[2]  David Brumley,et al.  Scheduling black-box mutational fuzzing , 2013, CCS.

[3]  Patrice Godefroid,et al.  SAGE: Whitebox Fuzzing for Security Testing , 2012, ACM Queue.

[4]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[5]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[6]  Ralph Langner,et al.  Stuxnet: Dissecting a Cyberwarfare Weapon , 2011, IEEE Security & Privacy.

[7]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[9]  Allen D. Householder,et al.  Probability-Based Parameter Selection for Black-Box Fuzz Testing , 2012 .

[10]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[13]  Radu State,et al.  Spectral Fuzzing: Evaluation & Feedback , 2010 .

[14]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[15]  Nicholas Nethercote,et al.  Valgrind: A Program Supervision Framework , 2003, RV@CAV.

[16]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[17]  Richard McNally,et al.  Fuzzing: The State of the Art , 2012 .