FuzzBench: an open fuzzer benchmarking platform and service

Fuzzing is a key tool used to reduce bugs in production software. At Google, fuzzing has uncovered tens of thousands of bugs. Fuzzing is also a popular subject of academic research. In 2020 alone, over 120 papers were published on the topic of improving, developing, and evaluating fuzzers and fuzzing techniques. Yet, proper evaluation of fuzzing techniques remains elusive. The community has struggled to converge on methodology and standard tools for fuzzer evaluation. To address this problem, we introduce FuzzBench as an open-source turnkey platform and free service for evaluating fuzzers. It aims to be easy to use, fast, reliable, and provides reproducible experiments. Since its release in March 2020, FuzzBench has been widely used both in industry and academia, carrying out more than 150 experiments for external users. It has been used by several published and in-the-work papers from academic groups, and has had real impact on the most widely used fuzzing tools in industry. The presented case studies suggest that FuzzBench is on its way to becoming a standard fuzzer benchmarking platform.

[1]  Raheem Beyah,et al.  UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers , 2020, USENIX Security Symposium.

[2]  Andrea Fioraldi,et al.  AFL++ : Combining Incremental Steps of Fuzzing Research , 2020, WOOT @ USENIX Security Symposium.

[3]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[4]  Choongwoo Han,et al.  Grey-Box Concolic Testing on Binary Code , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[5]  Chao Zhang,et al.  MOPT: Optimized Mutation Scheduling for Fuzzers , 2019, USENIX Security Symposium.

[6]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[8]  Laurent Simon,et al.  Improving Fuzzing through Controlled Compilation , 2020, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[9]  Mathias Payer,et al.  Magma: A Ground-Truth Fuzzing Benchmark , 2021, SIGMETRICS.

[10]  Andrew Ruef,et al.  Evaluating Fuzz Testing , 2018, CCS.

[11]  Dirk Beyer,et al.  Second Competition on Software Testing: Test-Comp 2020 , 2020, FASE.

[12]  Herbert Bos,et al.  Cupid : Automatic Fuzzer Selection for Collaborative Fuzzing , 2020, ACSAC.

[13]  Thorsten Holz,et al.  REDQUEEN: Fuzzing with Input-to-State Correspondence , 2019, NDSS.

[14]  Lionel C. Briand,et al.  Random Testing: Theoretical Results and Practical Implications , 2012, IEEE Transactions on Software Engineering.

[15]  Jun Xu,et al.  An Empirical Study on Benchmarks of Artificial Software Vulnerabilities , 2020, ArXiv.

[16]  Brendan Dolan-Gavitt,et al.  Evaluating Synthetic Bugs , 2021, AsiaCCS.

[17]  Aurélien Francillon,et al.  SymQEMU: Compilation-based symbolic execution for binaries , 2021, NDSS.

[18]  Marcel Böhme,et al.  Boosting fuzzer efficiency: an information theoretic perspective , 2020, ESEC/SIGSOFT FSE.

[19]  Mathias Payer,et al.  Magma , 2020 .