Initial Experiments with Duet Benchmarking: Performance Testing Interference in the Cloud

Accurate performance testing may require many measurements and therefore many machines to execute on. When many machines are needed, the cloud offers a tempting solution, however, measurements conducted in the cloud are generally considered unstable. In the context of comparing performance of two workloads, we propose a measurement procedure that improves accuracy by executing the workloads concurrently and using the measurements to filter outside interference. Depending on the platform used, experiments show average accuracy improvement ranging from 114% to 683% over sequential measurements on workloads running the ScalaBench suite with the Graal compiler.

[1]  Benjamin Farley,et al.  More for your money: exploiting performance heterogeneity in public clouds , 2012, SoCC '12.

[2]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Matthias Hauswirth,et al.  Perphecy: Performance Regression Test Selection Made Simple but Effective , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[4]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[5]  Radu Prodan,et al.  Analysing the Performance Instability Correlation with Various Workflow and Cloud Parameters , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[6]  Roozbeh Farahbod,et al.  Automated root cause isolation of performance regressions during software development , 2013, ICPE '13.

[7]  Tim Brecht,et al.  Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments , 2017, ICPE.

[8]  Mira Mezini,et al.  Da capo con scala: design and analysis of a scala benchmark suite for the java virtual machine , 2011, OOPSLA '11.

[9]  Antti Ylä-Jääski,et al.  Is the Same Instance Type Created Equal? Exploiting Heterogeneity of Public Clouds , 2013, IEEE Transactions on Cloud Computing.

[10]  Fabiana Santana Analysis of Performance Variability in Public Cloud Computing , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[11]  Petr Tuma,et al.  Unit testing performance with Stochastic Performance Logic , 2017, Automated Software Engineering.

[12]  Stefan Tai,et al.  What Are You Paying For? Performance Benchmarking for Infrastructure-as-a-Service Offerings , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[13]  Pietro Piazzolla,et al.  Flexible CPU Provisioning in Clouds: A New Source of Performance Unpredictability , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[14]  Philipp Leitner,et al.  Patterns in the Chaos—A Study of Performance Variation and Predictability in Public IaaS Clouds , 2014, ACM Trans. Internet Techn..

[15]  Philipp Leitner,et al.  Software microbenchmarking in the cloud. How bad is it really? , 2019, Empirical Software Engineering.

[16]  T. Hesterberg,et al.  What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum , 2014, The American statistician.

[17]  Andreas Sewe,et al.  Design and analysis of a scala benchmark suite for the Java virtual machine , 2012 .

[18]  John M. Acken,et al.  Measuring Performance Variability in the Clouds , 2018 .