Finding deep compiler bugs via guided stochastic program mutation

Compiler testing is important and challenging. Equivalence Modulo Inputs (EMI) is a recent promising approach for compiler validation. It is based on mutating the unexecuted statements of an existing program under some inputs to produce new equivalent test programs w.r.t. these inputs. Orion is a simple realization of EMI by only randomly deleting unexecuted statements. Despite its success in finding many bugs in production compilers, Orion’s effectiveness is still limited by its simple, blind mutation strategy. To more effectively realize EMI, this paper introduces a guided, advanced mutation strategy based on Bayesian optimization. Our goal is to generate diverse programs to more thoroughly exercise compilers. We achieve this with two techniques: (1) the support of both code deletions and insertions in the unexecuted regions, leading to a much larger test program space; and (2) the use of an objective function that promotes control-flow-diverse programs for guiding Markov Chain Monte Carlo (MCMC) optimization to explore the search space. Our technique helps discover deep bugs that require elaborate mutations. Our realization, Athena, targets C compilers. In 19 months, Athena has found 72 new bugs — many of which are deep and important bugs — in GCC and LLVM. Developers have confirmed all 72 bugs and fixed 68 of them.

[1]  Michael Stepp,et al.  Equality saturation: a new approach to optimization , 2009, POPL '09.

[2]  J. Gregory Morrisett,et al.  Evaluating value-graph translation validation for LLVM , 2011, PLDI '11.

[3]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[4]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[5]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[6]  Amir Pnueli,et al.  Translation Validation , 1998, TACAS.

[7]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[8]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[9]  Hanan Samet,et al.  Automatically proving the correctness of translations involving optimized code - research sponsored by Advanced Research Projects Agency, ARPA order no. 2494 , 1975, Stanford Artificial Intelligence Laboratory: Memo AIM.

[10]  Nagisa Ishiura,et al.  Random Testing of C Compilers Targeting Arithmetic Optimization , 2012 .

[11]  Zhendong Su,et al.  Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[12]  Xavier Leroy,et al.  Formal certification of a compiler back-end or: programming a compiler with a proof assistant , 2006, POPL '06.

[13]  Alex Groce,et al.  Taming compiler fuzzers , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[14]  Xavier Leroy,et al.  A Formally Verified Compiler Back-end , 2009, Journal of Automated Reasoning.

[15]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[16]  George C. Necula,et al.  Translation validation for an optimizing compiler , 2000, PLDI '00.

[17]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[18]  Santosh Nagarakatte,et al.  Formal verification of SSA-based optimizations for LLVM , 2013, PLDI.

[19]  W. Gilks Markov Chain Monte Carlo , 2005 .

[20]  J. Gregory Morrisett,et al.  Toward a verified relational database management system , 2010, POPL '10.

[21]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[22]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[23]  Atsushi Hashimoto,et al.  Scaling up Size and Number of Expressions in Random Testing of Arithmetic Optimization of C Compilers , 2013 .