DataMill: rigorous performance evaluation made easy

Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with the given time pressure of producing results. So how can one trust any reported empirical analysis of a new idea or concept in computer science? This paper introduces DataMill, a community-based easy-to-use services-oriented open benchmarking infrastructure for performance evaluation. DataMill facilitates producing robust, reliable, and reproducible results. The infrastructure incorporates the latest results on hidden factors and automates the variation of these factors. Multiple research groups already participate in DataMill. DataMill is also of interest for research on performance evaluation. The infrastructure supports quantifying the effect of hidden factors, disseminating the research results beyond mere reporting. It provides a platform for investigating interactions and composition of hidden factors.

[1]  Reinhold Weicker,et al.  Dhrystone: a synthetic systems programming benchmark , 1984, CACM.

[2]  Larry L. Peterson,et al.  Experience-driven experimental systems research , 2007, CACM.

[3]  Peter J. Denning,et al.  ACM President's Letter: What is experimental computer science? , 1980, CACM.

[4]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[5]  Miron Livny,et al.  Condor: a distributed job scheduler , 2001 .

[6]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[7]  Tim Brecht,et al.  Our troubles with Linux and why you should care , 2011, APSys.

[8]  Mariano Ruiz Espejo,et al.  Design of Experiments for Engineers and Scientists , 2006, Technometrics.

[9]  Richard E. Jones,et al.  Handles revisited: optimising performance and memory costs in a real-time collector , 2011, ISMM '11.

[10]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[11]  Danny Bickson,et al.  Everlab: A Production Platform for Research in Network Experimentation and Computation , 2007, LISA.

[12]  Petr Tuma,et al.  Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results , 2006, EPEW.

[13]  Larry L. Peterson,et al.  The design principles of PlanetLab , 2006, OPSR.

[14]  Peter J. Denning,et al.  ACM president's letter: performance analysis: experimental computer science as its best , 1981, CACM.

[15]  Larry L. Peterson,et al.  Experiences building PlanetLab , 2006, OSDI '06.

[16]  Jan Vitek,et al.  Repeatability, reproducibility and rigor in systems research , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[17]  Larry L. Peterson,et al.  PlanetFlow: maintaining accountability for network services , 2006, OPSR.

[18]  Peter J. Denning,et al.  Computing as a discipline , 1989, Computer.

[19]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[20]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[21]  Dejan S. Milojicic,et al.  Open Cirrus TM cloud computing testbed: federated data centers for open source systems and services research , 2009, CloudCom 2009.

[22]  Paul Lukowicz,et al.  Experimental evaluation in computer science: A quantitative study , 1995, J. Syst. Softw..

[23]  Peter J. Denning,et al.  Is computer science science? , 2005, CACM.

[24]  Geoffrey C. Fox,et al.  Supporting Experimental Computer Science , 2012 .