A statistics-based performance testing methodology for cloud applications

The low cost of resource ownership and flexibility have led users to increasingly port their applications to the clouds. To fully realize the cost benefits of cloud services, users usually need to reliably know the execution performance of their applications. However, due to the random performance fluctuations experienced by cloud applications, the black box nature of public clouds and the cloud usage costs, testing on clouds to acquire accurate performance results is extremely difficult. In this paper, we present a novel cloud performance testing methodology called PT4Cloud. By employing non-parametric statistical approaches of likelihood theory and the bootstrap method, PT4Cloud provides reliable stop conditions to obtain highly accurate performance distributions with confidence bands. These statistical approaches also allow users to specify intuitive accuracy goals and easily trade between accuracy and testing cost. We evaluated PT4Cloud with 33 benchmark configurations on Amazon Web Service and Chameleon clouds. When compared with performance data obtained from extensive performance tests, PT4Cloud provides testing results with 95.4% accuracy on average while reducing the number of test runs by 62%. We also propose two test execution reduction techniques for PT4Cloud, which can reduce the number of test runs by 90.1% while retaining an average accuracy of 91%. We compared our technique to three other techniques and found that our results are much more accurate.

[1]  Mor Harchol-Balter,et al.  WorkloadCompactor: reducing datacenter cost while providing tail latency SLO guarantees , 2017, SoCC.

[2]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[3]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[4]  Philipp Leitner,et al.  Patterns in the Chaos—A Study of Performance Variation and Predictability in Public IaaS Clouds , 2014, ACM Trans. Internet Techn..

[5]  Barbora Buhnova,et al.  Performance Challenges, Current Bad Practices, and Hints in PaaS Cloud Application Design , 2016, PERV.

[6]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[7]  Julien Gossa,et al.  An Overview of Cloud Simulation Enhancement Using the Monte-Carlo Method , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[8]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[9]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[10]  Laura Johnson,et al.  How Many Interviews Are Enough? , 2006 .

[11]  Marin Litoiu,et al.  Autonomic load-testing framework , 2011, ICAC '11.

[12]  Tingting Yu,et al.  PerfLearner: Learning from Bug Reports to Understand and Generate Performance Test Frames , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[14]  Julien Gossa,et al.  Improving Cloud Simulation Using the Monte-Carlo Method , 2018, Euro-Par.

[15]  Sven Apel,et al.  Data-efficient performance learning for configurable systems , 2018, Empirical Software Engineering.

[16]  Ying Zou,et al.  An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[17]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[18]  Carl K. Chang,et al.  Automating performance-related impact analysis through event based traceability , 2003, Requirements Engineering.

[19]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[20]  Thomas R. Gross,et al.  Performance regression testing of concurrent classes , 2014, ISSTA 2014.

[21]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[22]  Matthias Hauswirth,et al.  Catch me if you can: performance bug detection in the wild , 2011, OOPSLA '11.

[23]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[24]  Tim Menzies,et al.  Arrow: Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM , 2017, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[25]  Ahmed E. Hassan,et al.  An Automated Approach for Recommending When to Stop Performance Tests , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[26]  Tim Brecht,et al.  Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments , 2017, ICPE.

[27]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[28]  Matthew B. Dwyer,et al.  Automatic generation of load tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[29]  Ahmed E. Hassan,et al.  Continuous validation of load test suites , 2014, ICPE.

[30]  Danilo Ardagna,et al.  Evaluating the Auto Scaling Performance of Flexiscale and Amazon EC2 Clouds , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[31]  Ranjit Jhala,et al.  Finding latent performance bugs in systems implementations , 2010, FSE '10.

[32]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[33]  Richard A. Davis,et al.  Remarks on Some Nonparametric Estimates of a Density Function , 2011 .

[34]  Vittorio Cortellessa,et al.  Exploring synergies between bottleneck analysis and performance antipatterns , 2014, ICPE.

[35]  Yang Liu,et al.  Generating Performance Distributions via Probabilistic Symbolic Execution , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[36]  Sven Apel,et al.  Cost-Efficient Sampling for Performance Prediction of Configurable Systems (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[37]  Xiao Ma,et al.  Performance regression testing target prioritization via performance risk analysis , 2014, ICSE.

[38]  T. S. Eugene Ng,et al.  Application-specific configuration selection in the cloud: Impact of provider policy and potential of systematic testing , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[39]  Randy H. Katz,et al.  Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.

[40]  Cesare Pautasso,et al.  Kriging Controllers for Cloud Applications , 2013, IEEE Internet Computing.

[41]  Michel Dagenais,et al.  Automated Performance Deviation Detection across Software Versions Releases , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[42]  Igor I. Gorban,et al.  Phenomenon of statistical stability , 2014 .

[43]  Qi Luo,et al.  Enhancing Rules For Cloud Resource Provisioning Via Learned Software Performance Models , 2016, ICPE.

[44]  Abhik Roychoudhury,et al.  Program performance spectrum , 2013, LCTES '13.

[45]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[46]  Jonathon Shlens I T ] 8 A pr 2 01 4 Notes on Kullback-Leibler Divergence and Likelihood Theory , 2007 .

[47]  Tao Xie,et al.  PerfRanker: prioritization of performance regression tests for collection-intensive software , 2017, ISSTA.

[48]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[49]  Camil Demetrescu,et al.  Input-Sensitive Profiling , 2012, IEEE Transactions on Software Engineering.

[50]  Wei Wang,et al.  Testing Cloud Applications under Cloud-Uncertainty Performance Effects , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[51]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[52]  Shan Lu,et al.  Toddler: Detecting performance problems via similar memory-access patterns , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[53]  Jonathon Shlens,et al.  Notes on Kullback-Leibler Divergence and Likelihood , 2014, ArXiv.

[54]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[55]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[56]  Robert Ricci,et al.  Taming Performance Variability , 2018, OSDI.

[57]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[58]  Koushik Sen,et al.  WISE: Automated test generation for worst-case complexity , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[59]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[60]  M W Lenhoff,et al.  Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. , 1999, Gait & posture.