A Systematic Impact Study for Fuzzer-Found Compiler Bugs

Despite much recent interest in compiler fuzzing, the practical impact of fuzzer-found miscompilations on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of fuzzer-found compiler bugs. We follow a novel methodology where the impact of a miscompilation bug is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) how likely such changes are to cause runtime divergences during execution. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 other bugs found by the Alive formal verification tool or human users. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some applications, but barely affect their syntax and only cause two failures in total when running their regression test suites. Our manual analysis of a selection of bugs suggests that these bugs cannot trigger on the packages considered in the analysis, and that in general they affect only corner cases which have a low probability of occurring in practice. User-reported and Alive bugs do not exhibit a higher impact, with less frequently triggered bugs and one test failure.

[1]  K. V. Hanford,et al.  Automatic Generation of Test Cases , 1970, IBM Syst. J..

[2]  Alex Groce,et al.  Taming compiler fuzzers , 2013, PLDI.

[3]  Zhendong Su,et al.  Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[4]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[5]  Darko Marinov,et al.  Automated testing of refactoring engines , 2007, ESEC-FSE '07.

[6]  Nagisa Ishiura,et al.  Random testing of C compilers based on test program generation by equivalence transformation , 2016, 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[7]  Xuejun Yang,et al.  Testing Static Analyzers with Randomly Generated Programs , 2012, NASA Formal Methods.

[8]  Sergio Segura,et al.  A Survey on Metamorphic Testing , 2016, IEEE Transactions on Software Engineering.

[9]  P. Purdom A sentence generator for testing parsers , 1972 .

[10]  Tsong Yueh Chen,et al.  Metamorphic Testing: A New Approach for Generating Next Test Cases , 2020, ArXiv.

[11]  John Regehr,et al.  Provably correct peephole optimizations with alive , 2015, PLDI.

[12]  Alastair F. Donaldson,et al.  Automated testing of graphics shader compilers , 2017, Proc. ACM Program. Lang..

[13]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[14]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[15]  Abdulazeez S. Boujarwah,et al.  Compiler test case generation methods: a survey and assessment , 1997, Inf. Softw. Technol..

[16]  C. J. Burgess,et al.  The automatic generation of test cases for optimizing Fortran compilers , 1996, Inf. Softw. Technol..

[17]  Zhendong Su,et al.  Coverage-directed differential testing of JVM implementations , 2016, PLDI.

[18]  Zhendong Su,et al.  Finding deep compiler bugs via guided stochastic program mutation , 2015, OOPSLA.

[19]  Alastair F. Donaldson,et al.  Metamorphic Testing for (Graphics) Compilers , 2016, 2016 IEEE/ACM 1st International Workshop on Metamorphic Testing (MET).