An empirical study of optimization bugs in GCC and LLVM

Abstract Optimizations are the fundamental component of compilers. Bugs in optimizations have significant impacts, and can cause unintended application behavior and disasters, especially for safety-critical domains. Thus, an in-depth analysis of optimization bugs should be conducted to help developers understand and test the optimizations in compilers. To this end, we conduct an empirical study to investigate the characteristics of optimization bugs in two mainstream compilers, GCC and LLVM. We collect about 57K and 22K bugs of GCC and LLVM, and then exhaustively examine 8,771 and 1,564 optimization bugs of the two compilers, respectively. The results reveal the following five characteristics of optimization bugs: (1) Optimizations are the buggiest component in both compilers except for the C++ component; (2) the value range propagation optimization and the instruction combine optimization are the buggiest optimizations in GCC and LLVM, respectively; the loop optimizations in both GCC and LLVM are more bug-prone than other optimizations; (3) most of the optimization bugs in both GCC and LLVM are misoptimization bugs, accounting for 57.21% and 61.38% respectively; (4) on average, the optimization bugs live over five months, and developers take 11.16 months for GCC and 13.55 months for LLVM to fix an optimization bug; in both GCC and LLVM, many confirmed optimization bugs have lived for a long time; (5) the bug fixes of optimization bugs involve no more than two files and three functions on average in both compilers, and around 99% of them modify no more than 100 lines of code, while 90% less than 50 lines of code. Our study provides a deep understanding of optimization bugs for developers and researchers. This could provide useful guidance for the developers and researchers to better design the optimizations in compilers. In addition, the analysis results suggest that we need more effective techniques and tools to test compiler optimizations. Moreover, our findings are also useful to the research of automatic debugging techniques for compilers, such as automatic compiler bug isolation techniques.

[1]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Zhendong Su,et al.  Toward understanding compiler bugs in GCC and LLVM , 2016, ISSTA.

[4]  Xiao Liu,et al.  DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing , 2019, AAAI.

[5]  Cristian Cadar,et al.  Compiler fuzzing: how much does it matter? , 2019, Proc. ACM Program. Lang..

[6]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[7]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[8]  Vikram S. Adve,et al.  An empirical study of reported bugs in server software with implications for automated bug diagnosis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  Zhendong Su,et al.  Finding deep compiler bugs via guided stochastic program mutation , 2015, OOPSLA.

[10]  David Lo,et al.  An Empirical Study of Bugs in Machine Learning Systems , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[11]  Nagisa Ishiura,et al.  Random Testing of C Compilers Targeting Arithmetic Optimization , 2012 .

[12]  Shan Lu,et al.  Statistical debugging for real-world performance problems , 2014, OOPSLA.

[13]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[14]  Zhendong Su,et al.  Finding compiler bugs via live code mutation , 2016, OOPSLA.

[15]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[16]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[17]  Zhendong Su,et al.  Skeletal program enumeration for rigorous compiler testing , 2016, PLDI.

[18]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[19]  Zelong Zhao,et al.  Learning to Generate Comments for API-Based Code Snippets , 2017 .

[20]  Hridesh Rajan,et al.  A comprehensive study on deep learning bug characteristics , 2019, ESEC/SIGSOFT FSE.

[21]  Zhendong Su,et al.  Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[22]  Lingming Zhang,et al.  Compiler bug isolation via effective witness test program generation , 2019, ESEC/SIGSOFT FSE.