Language-Agnostic Generation of Compilable Test Programs

Testing is an integral part of the development of compilers and other language processors. To automatically create large sets of test programs, random program generators, or fuzzers, have emerged. Unfortunately, existing approaches are either language-specific (and thus require a rewrite for each language) or may generate programs that violate rules of the respective programming language (which limits their usefulness). This work introduces *Smith, a language-agnostic framework for the generation of valid, compilable test programs. It takes as input an abstract attribute grammar that specifies the syntactic and semantic rules of a programming language. It then creates test programs that satisfy all these rules. By aggressively pruning the search space and keeping the construction as local as possible, *Smith can generate huge, complex test programs in short time. We present four case studies covering four real-world programming languages (C, Lua, SQL, and SMT-LIB 2) to show that *Smith is both efficient and effective, while being flexible enough to support programming languages that differ considerably. We found bugs in all four case studies. For example, *Smith detected 165 different crashes in older versions of GCC and LLVM. *Smith and the language grammars are available online.

[1]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[2]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[3]  Zhendong Su,et al.  Finding and Analyzing Compiler Warning Defects , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[4]  Koen Claessen,et al.  Testing an optimising compiler by generating random lambda terms , 2011, AST '11.

[5]  Gergö Barany,et al.  Finding missed compiler optimizations by differential testing , 2018, CC.

[6]  Atsushi Hashimoto,et al.  Scaling up Size and Number of Expressions in Random Testing of Arithmetic Optimization of C Compilers , 2013 .

[7]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[8]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[9]  Paul Walton Purdom,et al.  A sentence generator for testing parsers , 1972 .

[10]  Ramana Kumar,et al.  CakeML: a verified implementation of ML , 2014, POPL.

[11]  Clark W. Barrett,et al.  The SMT-LIB Standard Version 2.0 , 2010 .

[12]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[13]  Chris Cummins,et al.  Compiler fuzzing through deep learning , 2018, ISSTA.

[14]  Armin Biere,et al.  Boolector 2.0 , 2015, J. Satisf. Boolean Model. Comput..

[15]  Cristian Cadar,et al.  Automatic testing of symbolic execution engines via program generation and differential testing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Bruno Dutertre,et al.  Yices 2.2 , 2014, CAV.

[17]  Lu Zhang,et al.  An Empirical Comparison of Compiler Testing Techniques , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[18]  Christian Lindig,et al.  Random testing of C calling conventions , 2005, AADEBUG'05.

[19]  Sang Kil Cha,et al.  CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines , 2019, NDSS.

[20]  Eric Eide,et al.  Volatiles are miscompiled, and what to do about it , 2008, EMSOFT '08.

[21]  Xuejun Yang,et al.  Testing Static Analyzers with Randomly Generated Programs , 2012, NASA Formal Methods.

[22]  Takahide Yoshikawa,et al.  Random program generator for Java JIT compiler test system , 2003, Third International Conference on Quality Software, 2003. Proceedings..

[23]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[24]  Zhendong Su,et al.  HDD: hierarchical delta debugging , 2006, ICSE.

[25]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[26]  Zhendong Su,et al.  Skeletal program enumeration for rigorous compiler testing , 2016, PLDI.

[27]  Hongyu Zhang,et al.  Learning to Prioritize Test Programs for Compiler Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[28]  Darko Marinov,et al.  Automated testing of refactoring engines , 2007, ESEC-FSE '07.

[29]  Lu Zhang,et al.  Test Case Prioritization for Compilers: A Text-Vector Based Approach , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[30]  Armin Biere,et al.  Fuzzing and delta-debugging SMT solvers , 2009, SMT '09.

[31]  Emin Gün Sirer,et al.  Using production grammars in software testing , 1999, DSL '99.

[32]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[33]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[34]  Michael Pradel,et al.  Automatically reducing tree-structured test inputs , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[36]  Zhendong Su,et al.  Perses: Syntax-Guided Program Reduction , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[37]  Herbert Bos,et al.  IFuzzer: An Evolutionary Interpreter Fuzzer Using Genetic Programming , 2016, ESORICS.

[38]  Alastair F. Donaldson,et al.  Automated testing of graphics shader compilers , 2017, Proc. ACM Program. Lang..

[39]  Roberto Ierusalimschy,et al.  Lua 5.1 Reference Manual , 2006 .

[40]  Hongliang Liang,et al.  Fuzzing: State of the Art , 2018, IEEE Transactions on Reliability.

[41]  Alex Groce,et al.  Taming compiler fuzzers , 2013, PLDI.

[42]  Flash Sheridan Practical testing of a C99 compiler using output comparison , 2007 .