Reproducibility: the extent to which consistent results are obtained when an experiment is repeated, is important as a means to validate experimental results, promote integrity of research, and accelerate follow up work. Commitment to artifact reviewing and badging seeks to promote reproducibility and rank the quality of submitted artifacts. However, as illustrated in this issue, the current badging scheme, with its focus upon an artifact being reusable, may not identify limitations of architecture, implementation, or evaluation. We propose that to improve the insight into artifact reproducibility, the depth and nature of artifact evaluation must move beyond simply considering if an artifact is reusable. Artifact evaluation should consider the methods of that evaluation alongside the varying of inputs to that evaluation. To achieve this, we suggest an extension to the scope of artifact badging, and describe both approaches and best practice arising in other communities. We seek to promote conversation and make a call to action intended to strengthen the scientific method within our domain.
[1]
Erik van der Kouwe,et al.
Benchmarking Crimes: An Emerging Threat in Systems Security
,
2018,
ArXiv.
[2]
Georg Carle,et al.
Towards an Ecosystem for Reproducible Research in Computer Networking
,
2017,
Reproducibility@SIGCOMM.
[3]
Noa Zilberman,et al.
Stardust: Divide and Conquer in the Data Center Network
,
2019,
NSDI.
[4]
Luigi Iannone,et al.
Evaluating the artifacts of SIGCOMM papers
,
2019,
CCRV.
[5]
Mark Handley,et al.
Re-architecting datacenter networks and stacks for low latency and high performance
,
2017,
SIGCOMM.
[6]
Klaus Wehrle,et al.
The Dagstuhl beginners guide to reproducibility for experimental networking research
,
2019,
CCRV.
[7]
Ronald F. Boisvert,et al.
Incentivizing reproducibility
,
2016,
Commun. ACM.
[8]
Noa Zilberman,et al.
An Artifact Evaluation of NDP
,
2020,
Comput. Commun. Rev..