Static duplicate bug-report identification for compilers

Compiler bug reports are important for guaranteeing compiler quality; however, duplicate bug reports tend to incur extra costs. To identify duplicate bug reports for compilers, we propose a static approach (IdenDup) to identifying duplicate bug reports for compilers. This method effectively identifies duplicate bug reports for compilers in two scenarios (fuzz testing and the bug-management system) by utilizing static text and program information, including lexical features, syntax features, and proposed dataflow features that describe variable-usage path features (i.e., how variables are used and their order). We conducted empirical evaluations of the effectiveness of IdenDup based on the use of GCC and LLVM, with our results demonstrating that IdenDup effectively identified duplicate bug reports in the two scenarios for compilers and outperformed existing approaches.