论文信息 - EqBench: A Dataset of Equivalent and Non-equivalent Program Pairs

EqBench: A Dataset of Equivalent and Non-equivalent Program Pairs

Equivalence checking techniques help establish whether two versions of a program exhibit the same behavior. The majority of popular techniques for formally proving/refuting equivalence are evaluated on small and simplistic benchmarks, omitting "difficult" programming constructs, such as non-linear arithmetic, loops, floating-point arithmetic, and string and array manipulation. This hinders efficient evaluation of these techniques and the ability to establish their practical applicability in real scenarios. This paper addresses this gap by contributing EqBench – the largest and most comprehensive benchmark for equivalence checking analysis, which contains 147 equivalent and 125 non-equivalent cases, in both C and Java languages. We believe EqBench can facilitate a more realistic evaluation of equivalence checking techniques, assessing their individual strength and weaknesses. EqBench is publicly available at: https://osf.io/93s5b/.

[1] Daniel Kroening,et al. Modular Demand-Driven Analysis of Semantic Difference for Program Versions , 2017, SAS.

[2] Marcelo d'Amorim,et al. CORAL: Solving Complex Constraints for Symbolic PathFinder , 2011, NASA Formal Methods.

[3] Suzette Person,et al. Regression Verification Using Impact Summaries , 2013, SPIN.

[4] Gul A. Agha,et al. Solving complex path conditions through heuristic search on induced polytopes , 2014, FSE 2014.

[5] Ofer Strichman,et al. Regression Verification: Proving the Equivalence of Similar Programs , 2009, CAV.

[6] Shuvendu K. Lahiri,et al. SYMDIFF: A Language-Agnostic Semantic Diff Tool for Imperative Programs , 2012, CAV.

[7] Alexander Aiken,et al. Stochastic superoptimization , 2012, ASPLOS '13.

[8] Bor-Yuh Evan Chang,et al. Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[9] Yi Li,et al. ARDiff: scaling program equivalence checking via iterative abstraction and refinement of common code , 2020, ESEC/SIGSOFT FSE.

[10] Elmar Jürgens,et al. How much does unused code matter for maintenance? , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[11] Mark Harman,et al. An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[12] Marsha Chechik,et al. Client-Specific Equivalence Checking , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13] Harald C. Gall,et al. Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[14] George C. Necula,et al. Translation validation for an optimizing compiler , 2000, PLDI '00.

[15] Xin Li,et al. Symbolic execution of complex program driven by machine learning based constraint solving , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16] Vladimir Klebanov,et al. Automating regression verification , 2014, Software Engineering & Management.

[17] Matthew B. Dwyer,et al. Differential symbolic execution , 2008, SIGSOFT '08/FSE-16.

[18] Andreas Kuehlmann,et al. Equivalence checking using cuts and heaps , 1997, DAC.

[19] Sumit Gulwani,et al. From relational verification to SIMD loop synthesis , 2013, PPoPP '13.