论文信息 - Skeletal program enumeration for rigorous compiler testing

Skeletal program enumeration for rigorous compiler testing

A program can be viewed as a syntactic structure P (syntactic skeleton) parameterized by a collection of identifiers V (variable names). This paper introduces the skeletal program enumeration (SPE) problem: Given a syntactic skeleton P and a set of variables V , enumerate a set of programs P exhibiting all possible variable usage patterns within P. It proposes an effective realization of SPE for systematic, rigorous compiler testing by leveraging three important observations: (1) Programs with different variable usage patterns exhibit diverse control- and data-dependence, and help exploit different compiler optimizations; (2) most real compiler bugs were revealed by small tests (i.e., small-sized P) — this “small-scope” observation opens up SPE for practical compiler validation; and (3) SPE is exhaustive w.r.t. a given syntactic skeleton and variable set, offering a level of guarantee absent from all existing compiler testing techniques. The key challenge of SPE is how to eliminate the enormous amount of equivalent programs w.r.t. α-conversion. Our main technical contribution is a novel algorithm for computing the canonical (and smallest) set of all non-α-equivalent programs. To demonstrate its practical utility, we have applied the SPE technique to test C/C++ compilers using syntactic skeletons derived from their own regression test-suites. Our evaluation results are extremely encouraging. In less than six months, our approach has led to 217 confirmed GCC/Clang bug reports, 119 of which have already been fixed, and the majority are long latent despite extensive prior testing efforts. Our SPE algorithm also provides six orders of magnitude reduction. Moreover, in three weeks, our technique has found 29 CompCert crashing bugs and 42 bugs in two Scala optimizing compilers. These results demonstrate our SPE technique’s generality and further illustrate its effectiveness.

[1] Zhendong Su,et al. Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[2] Andreas Zeller,et al. Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[3] Sarfraz Khurshid,et al. Software assurance by bounded exhaustive testing , 2004, IEEE Transactions on Software Engineering.

[4] de Ng Dick Bruijn,et al. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem , 1972 .

[5] Donald E. Knuth,et al. The Art of Computer Programming: Combinatorial Algorithms, Part 1 , 2011 .

[6] Susumu Katayama. Efficient Exhaustive Generation of Functional Programs Using Monte-Carlo Search with Iterative Deepening , 2008, PRICAI.

[7] Richard G. Hamlet,et al. Testing Programs with the Aid of a Compiler , 1977, IEEE Transactions on Software Engineering.

[8] Pierre Lescanne,et al. Counting and generating lambda terms , 2012, Journal of Functional Programming.

[9] Flemming Nielson,et al. Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[10] Xavier Leroy,et al. Formal certification of a compiler back-end or: programming a compiler with a proof assistant , 2006, POPL '06.

[11] Alex Groce,et al. Taming compiler fuzzers , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[12] Abdulazeez S. Boujarwah,et al. Compiler test case generation methods: a survey and assessment , 1997, Inf. Softw. Technol..

[13] Amir Pnueli,et al. Translation Validation , 1998, TACAS.

[14] Sarfraz Khurshid,et al. Test generation through programming in UDITA , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[15] Chucky Ellison,et al. An executable formal semantics of C with applications , 2011, POPL '12.

[16] Fabio Fioravanti,et al. Generation of Test Data Structures Using Constraint Logic Programming , 2012, TAP@TOOLS.

[17] W. M. McKeeman,et al. Differential Testing for Software , 1998, Digit. Tech. J..

[18] Koen Claessen,et al. QuickCheck: a lightweight tool for random testing of Haskell programs , 2011, SIGP.

[19] Susumu Katayama. Systematic search for lambda expressions , 2005, Trends in Functional Programming.

[20] Sarfraz Khurshid,et al. TestEra: Specification-Based Testing of Java Programs Using SAT , 2004, Automated Software Engineering.

[21] Zhendong Su,et al. Finding deep compiler bugs via guided stochastic program mutation , 2015, OOPSLA.

[22] Sarfraz Khurshid,et al. Test input generation with java PathFinder , 2004, ISSTA '04.

[23] Pierre Lescanne,et al. Counting and generating terms in the binary lambda calculus , 2015, J. Funct. Program..

[24] Sarfraz Khurshid,et al. Bounded exhaustive test input generation from hybrid invariants , 2014, OOPSLA 2014.

[25] Susumu Katayama. An analytical inductive functional programming system that avoids unintended programs , 2012, PEPM '12.

[26] Zhendong Su,et al. Compiler validation via equivalence modulo inputs , 2014, PLDI.

[27] ChenYang,et al. Finding and understanding bugs in C compilers , 2011 .

[28] Toufik Mansour,et al. Loop-free Gray code algorithm for the e-restricted growth functions , 2011, Inf. Process. Lett..

[29] Atsushi Hashimoto,et al. Reinforcing Random Testing of Arithmetic Optimization of C Compilers by Scaling up Size and Number of Expressions , 2014, IPSJ Trans. Syst. LSI Des. Methodol..

[30] Meng Wang,et al. Feat: functional enumeration of algebraic types , 2012, Haskell.

[31] Xuejun Yang,et al. Finding and understanding bugs in C compilers , 2011, PLDI '11.

[32] Zhendong Su,et al. Finding compiler bugs via live code mutation , 2016, OOPSLA.

[33] de Ng Dick Bruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem , 1972 .

[34] Koen Claessen,et al. Testing an optimising compiler by generating random lambda terms , 2011, AST '11.

[35] David Thomas,et al. The Art in Computer Programming , 2001 .

[36] Viktor Kuncak,et al. Programming with enumerable sets of structures , 2015, OOPSLA.

[37] Forrest Briggs,et al. Functional genetic programming and exhaustive program search with combinator expressions , 2008, Int. J. Knowl. Based Intell. Eng. Syst..

[38] Alastair F. Donaldson,et al. Many-core compiler fuzzing , 2015, PLDI.

[39] Koen Claessen,et al. QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[40] Koen Claessen,et al. Generating constrained random data with uniform distribution , 2014, Journal of Functional Programming.

[41] Xuejun Yang,et al. Test-case reduction for C compiler bugs , 2012, PLDI.

[42] Marcelo F. Frias,et al. TACO: Efficient SAT-Based Bounded Verification Using Symmetry Breaking and Tight Bounds , 2013, IEEE Transactions on Software Engineering.

[43] Donald L. Kreher,et al. Combinatorial algorithms: generation, enumeration, and search , 1998, SIGA.

[44] Jim McShea. Gray codes , 2008 .

[45] Zhendong Su,et al. Toward understanding compiler bugs in GCC and LLVM , 2016, ISSTA.

[46] Zhendong Su,et al. Finding and Analyzing Compiler Warning Defects , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[47] Pierre Lescanne,et al. On counting untyped lambda terms , 2011, Theor. Comput. Sci..

[48] Sarfraz Khurshid,et al. Korat: automated testing based on Java predicates , 2002, ISSTA '02.

[49] Koen Claessen,et al. Making Random Judgments: Automatically Generating Well-Typed Terms from the Definition of a Type-System , 2015, ESOP.

[50] Darko Marinov,et al. Automated testing of refactoring engines , 2007, ESEC-FSE '07.

[51] Richard J. Lipton,et al. Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[52] George C. Necula,et al. Translation validation for an optimizing compiler , 2000, PLDI '00.

[53] Toufik Mansour,et al. Gray Codes, Loopless Algorithm and Partitions , 2008, J. Math. Model. Algorithms.

[54] Ronald F. Boisvert,et al. NIST Handbook of Mathematical Functions , 2010 .

[55] Paul Tarau. On Type-directed Generation of Lambda Terms , 2015, ICLP.

[56] Colin Runciman,et al. Smallcheck and lazy smallcheck: automatic exhaustive testing for small values , 2008, Haskell '08.