Finding missed compiler optimizations by differential testing

Randomized differential testing of compilers has had great success in finding compiler crashes and silent miscompilations. In this paper we investigate whether we can use similar techniques to improve the quality of the generated code: Can we compare the code generated by different compilers to find optimizations performed by one but missed by another? We have developed a set of tools for running such tests. We compile C code generated by standard random program generators and use a custom binary analysis tool to compare the output programs. Depending on the optimization of interest, the tool can be configured to compare features such as the number of total instructions, multiply or divide instructions, function calls, stack accesses, and more. A standard test case reduction tool produces minimal examples once an interesting difference has been found. We have used our tools to compare the code generated by GCC, Clang, and CompCert. We have found previously unreported missing arithmetic optimizations in all three compilers, as well as individual cases of unnecessary register spilling, missed opportunities for register coalescing, dead stores, redundant computations, and missing instruction selection patterns.

[1]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[2]  Nikolai Kosmatov,et al.  Frama-C: A software analysis perspective , 2015, Formal Aspects of Computing.

[3]  Dirk Grunwald,et al.  Chainsaw: Using Binary Matching for Relative Instruction Mix Comparison , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[4]  Dirk Beyer,et al.  CPAchecker: A Tool for Configurable Software Verification , 2009, CAV.

[5]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[6]  Dirk Grunwald,et al.  OptiScope: Performance Accountability for Optimizing Compilers , 2009, 2009 International Symposium on Code Generation and Optimization.

[7]  Gergö Barany,et al.  Liveness-Driven Random Program Generation , 2017, LOPSTR.

[8]  Flemming Nielson,et al.  Effect-driven QuickChecking of compilers , 2017, Proc. ACM Program. Lang..

[10]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[11]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[12]  John Regehr,et al.  Souper: A Synthesizing Superoptimizer , 2017, ArXiv.

[13]  Sandrine Blazy,et al.  Structuring Abstract Interpreters Through State and Value Abstractions , 2017, VMCAI.

[14]  Zhendong Su,et al.  Toward understanding compiler bugs in GCC and LLVM , 2016, ISSTA.

[15]  Atsushi Hashimoto,et al.  Reinforcing Random Testing of Arithmetic Optimization of C Compilers by Scaling up Size and Number of Expressions , 2014, IPSJ Trans. Syst. LSI Des. Methodol..

[16]  Yu Chen,et al.  A New Algorithm for Identifying Loops in Decompilation , 2007, SAS.

[17]  Dinakar Dhurjati,et al.  Scaling up Superoptimization , 2016, ASPLOS.

[18]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[19]  Sebastian Buchwald Optgen: A Generator for Local Optimizations , 2015, CC.

[20]  Atsushi Hashimoto,et al.  Detecting Arithmetic Optimization Opportunities for C Compilers by Randomly Generated Equivalent Programs , 2016, IPSJ Trans. Syst. LSI Des. Methodol..

[21]  Andrew W. Appel,et al.  Iterated register coalescing , 1996, TOPL.

[22]  Eric Eide,et al.  Volatiles are miscompiled, and what to do about it , 2008, EMSOFT '08.

[23]  Xuejun Yang,et al.  Testing Static Analyzers with Randomly Generated Programs , 2012, NASA Formal Methods.