论文信息 - Large-Scale Evaluation of the Efficiency of Runtime-Verification Tools in the Wild

Large-Scale Evaluation of the Efficiency of Runtime-Verification Tools in the Wild

Runtime verification (RV) is a field of study which suffers from a lack of dedicated benchmarks. Many published evaluations of RV tools rely on workloads which are not representative of real-world programs. In this paper, we present a methodology to automatically discover relevant open-source projects for evaluating RV tools. This is done by analyzing unit tests on a large number of projects hosted on GitHub. Our evaluation shows that analyzing a large number of open-source projects—instead of a handful of manually selected workloads—provides better insight into the behavior of three state-of-the-art RV tools (JavaMOP, MarQ, and Muffin) based on two metrics (memory utilization and runtime overhead). By monitoring test executions of a large number of projects, we show that none of the evaluated RV tools wins for both metrics.

Omar Javed | Walter Binder

[1] Gordon J. Pace,et al. Combining Testing and Runtime Verification Techniques , 2012, MOMPES.

[2] Grigore Rosu,et al. JavaMOP: Efficient parametric runtime monitoring framework , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[3] Darko Marinov,et al. DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[4] Jan Vitek,et al. DéjàVu: a map of code duplicates on GitHub , 2017, Proc. ACM Program. Lang..

[5] Yuanyuan Zhou,et al. BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[6] David E. Rydeheard,et al. MarQ: Monitoring at Runtime with QEA , 2015, TACAS.

[7] Daniela E. Damian,et al. The promises and perils of mining GitHub , 2009, MSR 2014.

[8] Normann Decker,et al. Runtime Monitoring with Union-Find Structures , 2016, TACAS.

[9] Yi Zhang,et al. RV-Monitor: Efficient Parametric Runtime Verification with Simultaneous Properties , 2014, RV.

[10] Haiyang Sun,et al. AutoBench: Finding Workloads That You Need Using Pluggable Hybrid Analyses , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[11] Eran Yahav,et al. QVM: an efficient runtime for detecting defects in deployed systems , 2008, OOPSLA '08.

[12] Danny Dig,et al. Understanding the use of lambda expressions in Java , 2017, Proc. ACM Program. Lang..

[13] Amer Diwan,et al. The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[14] Premkumar T. Devanbu,et al. A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[15] Jianjun Zhao,et al. JaConTeBe: A Benchmark Suite of Real-World Java Concurrency Bugs (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16] Choonghwan Lee,et al. Towards Categorizing and Formalizing the JDK API , 2012 .

[17] Wang Zhi-jian. Using Benchmarking to Advance Research:A Challenge to Software Engineering , 2005 .

[18] Grigore Rosu,et al. How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19] Darko Marinov,et al. An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[20] Ezio Bartocci,et al. First international Competition on Runtime Verification: rules, benchmarks, tools, and final results of CRV 2014 , 2017, International Journal on Software Tools for Technology Transfer.

[21] Mira Mezini,et al. Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .