Toward understanding compiler bugs in GCC and LLVM

Compilers are critical, widely-used complex software. Bugs in them have significant impact, and can cause serious damage when they silently miscompile a safety-critical application. An in-depth understanding of compiler bugs can help detect and fix them. To this end, we conduct the first empirical study on the characteristics of the bugs in two main-stream compilers, GCC and LLVM. Our study is significant in scale — it exhaustively examines about 50K bugs and 30K bug fix revisions over more than a decade’s span. This paper details our systematic study. Summary findings include: (1) In both compilers, C++ is the most buggy component, accounting for around 20% of the total bugs and twice as many as the second most buggy component; (2) the bug revealing test cases are typically small, with 80% having fewer than 45 lines of code; (3) most of the bug fixes touch a single source file with small modifications (43 lines for GCC and 38 for LLVM on average); (4) the average lifetime of GCC bugs is 200 days, and 111 days for LLVM; and (5) high priority tends to be assigned to optimizer bugs, most notably 30% of the bugs in GCC’s inter-procedural analysis component are labeled P1 (the highest priority). This study deepens our understanding of compiler bugs. For application developers, it shows that even mature production compilers still have many bugs, which may affect development. For researchers and compiler developers, it sheds light on interesting characteristics of compiler bugs, and highlights challenges and opportunities to more effectively test and debug compilers.

[1]  John Regehr,et al.  Provably correct peephole optimizations with alive , 2015, PLDI.

[2]  Ram Chillarege,et al.  Defect type and its impact on the growth curve (software development) , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[3]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[4]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  Ding Yuan,et al.  How do fixes become bugs? , 2011, ESEC/FSE '11.

[6]  Zhendong Su,et al.  Finding and Analyzing Compiler Warning Defects , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[7]  Liang Guo,et al.  Automated test program generation for an industrial optimizing compiler , 2009, 2009 ICSE Workshop on Automation of Software Test.

[8]  Lorenzo Martignoni,et al.  Testing system virtual machines , 2010, ISSTA '10.

[9]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[10]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[11]  Nagisa Ishiura,et al.  Random Testing of C Compilers Targeting Arithmetic Optimization , 2012 .

[12]  Amir Pnueli,et al.  Translation Validation , 1998, TACAS.

[13]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[14]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[15]  Ning Chen,et al.  Software process evaluation: A machine learning approach , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[16]  Vikram S. Adve,et al.  An empirical study of reported bugs in server software with implications for automated bug diagnosis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[18]  Xavier Leroy,et al.  The CompCert Memory Model, Version 2 , 2012 .

[19]  Lorenzo Martignoni,et al.  A methodology for testing CPU emulators , 2013, TSEM.

[20]  Shan Lu,et al.  Statistical debugging for real-world performance problems , 2014, OOPSLA.

[21]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[22]  Alex Groce,et al.  Taming compiler fuzzers , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[23]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[24]  Atsushi Hashimoto,et al.  Scaling up Size and Number of Expressions in Random Testing of Arithmetic Optimization of C Compilers , 2013 .

[25]  David Lo,et al.  An Empirical Study of Bugs in Machine Learning Systems , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[26]  Xavier Leroy,et al.  Formal Verification of a C Compiler Front-End , 2006, FM.

[27]  Zhendong Su,et al.  Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[28]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[29]  Xavier Leroy,et al.  Formal verification of translation validators: a case study on instruction scheduling optimizations , 2008, POPL '08.

[30]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[31]  David Lo,et al.  DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis , 2013, ICSM.

[32]  Ning Chen,et al.  Mining explicit rules for software process evaluation , 2013, ICSSP.

[33]  Xuejun Yang,et al.  Testing Static Analyzers with Randomly Generated Programs , 2012, NASA Formal Methods.

[34]  Zhendong Su,et al.  Finding deep compiler bugs via guided stochastic program mutation , 2015, OOPSLA.