A Case Study on the Stability of Performance Tests for Serverless Applications

Context. While in serverless computing, application resource management and operational concerns are generally delegated to the cloud provider, ensuring that serverless applications meet their performance requirements is still a responsibility of the developers. Performance testing is a commonly used performance assessment practice; however, it traditionally requires visibility of the resource environment. Objective. In this study, we investigate whether performance tests of serverless applications are stable, that is, if their results are reproducible, and what implications the serverless paradigm has for performance tests. Method. We conduct a case study where we collect two datasets of performance test results: (a) repetitions of performance tests for varying memory size and load intensities and (b) three repetitions of the same performance test every day for ten months. Results. We find that performance tests of serverless applications are comparatively stable if conducted on the same day. However, we also observe short-term performance variations and frequent long-term performance changes. Conclusion. Performance tests for serverless applications can be stable; Email addresses: simon.eismann@uni-wuerzburg.de (Simon Eismann), diego.costa@concordia.ca (Diego Elias Costa), l_lizhi@encs.concordia.ca (Lizhi Liao), bezemer@ualberta.ca (Cor-Paul Bezemer), shang@encs.concordia.ca (Weiyi Shang), van.hoorn@iste.uni-stuttgart.de (André van Hoorn), samuel.kounev@uni-wuerzburg.de (Samuel Kounev) Preprint submitted to Journal of Software and Systems July 29, 2021 ar X iv :2 10 7. 13 32 0v 1 [ cs .D C ] 2 8 Ju l 2 02 1 however, the serverless model impacts the planning, execution, and analysis of performance tests.

[1]  Josef Spillner,et al.  A mixed-method empirical study of Function-as-a-Service software development in industrial practice , 2018, PeerJ Prepr..

[2]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[3]  Yubin Xia,et al.  Characterizing serverless platforms with serverlessbench , 2020, SoCC.

[4]  Cristina L. Abad,et al.  Serverless Applications: Why, When, and How? , 2020, IEEE Software.

[5]  Philipp Leitner,et al.  An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[6]  Josef Spillner,et al.  Systematic and open exploration of FaaS and Serverless Computing research , 2018, ESSCA@UCC.

[7]  Avelino Francisco Zorzo,et al.  Evaluating Load Generation in Virtualized Environments for Software Performance Testing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[8]  Perry Cheng,et al.  Serverless Computing: Current Trends and Open Problems , 2017, Research Advances in Cloud Computing.

[9]  Cristina L. Abad,et al.  Methodological Principles for Reproducible Performance Evaluation in Cloud Computing SPEC RG Cloud Working Group , 2019 .

[10]  André van Hoorn,et al.  Microservices: A Performance Tester's Dream or Nightmare? , 2020, ICPE.

[11]  David Daly,et al.  The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System , 2020, ICPE.

[12]  Elaine J. Weyuker,et al.  Experience with Performance Testing of Software Systems: Issues, an Approach, and Case Study , 2000, IEEE Trans. Software Eng..

[13]  Paul R. Brenner,et al.  Serverless Computing: Design, Implementation, and Performance , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[14]  Ricki G. Ingalls,et al.  Evaluation of methods used to detect warm-up period in steady state simulation , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[15]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[16]  K. Preston White,et al.  A comparison of five steady-state truncation heuristics for simulation , 2000, 2000 Winter Simulation Conference Proceedings (Cat. No.00CH37165).

[17]  Ahmed E. Hassan,et al.  Automated detection of performance regressions using statistical process control techniques , 2012, ICPE '12.

[18]  Mohsen Sharifi,et al.  ppXen: A hypervisor CPU scheduler for mitigating performance variability in virtualized clouds , 2018, Future Gener. Comput. Syst..

[19]  Ruth Davies,et al.  Automating warm-up length estimation , 2008, 2008 Winter Simulation Conference.

[20]  Maciej Malawski,et al.  Performance evaluation of heterogeneous cloud functions , 2018, Concurr. Comput. Pract. Exp..

[21]  Cor-Paul Bezemer,et al.  An Exploratory Study of the State of Practice of Performance Testing in Java-Based Open Source Projects , 2017, ICPE.

[22]  Cor-Paul Bezemer,et al.  What's Wrong with My Benchmark Results? Studying Bad Practices in JMH Benchmarks , 2019, IEEE Transactions on Software Engineering.

[23]  Samuel Kounev,et al.  TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[24]  Jinfu Chen,et al.  Using black-box performance models to detect performance regressions under varying workloads: an empirical study , 2020, Empirical Software Engineering.

[25]  Ahmed E. Hassan,et al.  Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters , 2015, ICPE.

[26]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[27]  Ihsan Sabuncuoglu,et al.  Analysis of the behavior of the transient period in non-terminating simulations , 2006, Eur. J. Oper. Res..

[28]  B. A. Pozin,et al.  Models in performance testing , 2011, Programming and Computer Software.

[29]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX ATC.

[30]  Cristina L. Abad,et al.  A SPEC RG Cloud Group's Vision on the Performance Challenges of FaaS Cloud Architectures , 2018, ICPE Companion.

[31]  Tulshi Bezboruah,et al.  Investigation on performance testing and evaluation of PReWebD: a .NET technique for implementing web application , 2011, IET Softw..

[32]  Robert Cordingly,et al.  Predicting Performance and Cost of Serverless Computing Functions with SAAF , 2020, 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech).

[33]  Rajesh Subramanyan,et al.  Performance Testing: Far from Steady State , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops.

[34]  Calton Pu,et al.  An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[35]  Tim Brecht,et al.  Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments , 2017, ICPE.

[36]  Peter Kilpatrick,et al.  IO performance prediction in consolidated virtualized environments , 2011, ICPE '11.

[37]  Ahmed E. Hassan,et al.  An industrial case study of automatically identifying performance regression-causes , 2014, MSR 2014.

[38]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[39]  Cristina L. Abad,et al.  The SPEC-RG Reference Architecture for FaaS: From Microservices and Containers to Serverless Platforms , 2019, IEEE Internet Computing.

[40]  Ahmed E. Hassan,et al.  A Survey on Load Testing of Large-Scale Software Systems , 2015, IEEE Transactions on Software Engineering.

[41]  Weiyi Shang,et al.  Empirical study on the discrepancy between performance testing results from virtual and physical environments , 2018, Empirical Software Engineering.

[42]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[43]  Philipp Leitner,et al.  Software microbenchmarking in the cloud. How bad is it really? , 2019, Empirical Software Engineering.

[44]  Shrideep Pallickara,et al.  Serverless Computing: An Investigation of Factors Influencing Microservice Performance , 2018, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[45]  Geoffrey C. Fox,et al.  Evaluation of Production Serverless Computing Environments , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[46]  Ahmed E. Hassan,et al.  Leveraging Performance Counters and Execution Logs to Diagnose Memory-Related Performance Issues , 2013, 2013 IEEE International Conference on Software Maintenance.

[47]  Alexandru Agache,et al.  Firecracker: Lightweight Virtualization for Serverless Applications , 2020, NSDI.

[48]  Samuel Kounev,et al.  Evaluating and Modeling Virtualization Performance Overhead for Cloud Environments , 2011, CLOSER.

[49]  Norman Cliff,et al.  Ordinal Analysis of Behavioral Data , 2003 .

[50]  Philipp Leitner,et al.  Patterns in the Chaos—A Study of Performance Variation and Predictability in Public IaaS Clouds , 2014, ACM Trans. Internet Techn..

[51]  Connie U. Smith,et al.  New Software Performance AntiPatterns: More Ways to Shoot Yourself in the Foot , 2002, Int. CMG Conference.

[52]  Cristina L. Abad,et al.  A Review of Serverless Use Cases and their Characteristics , 2020, ArXiv.

[53]  Abhishek Chandra,et al.  Does virtualization make disk scheduling passé? , 2010, OPSR.

[54]  Jeongchul Kim,et al.  FunctionBench: A Suite of Workloads for Serverless Cloud Function Service , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[55]  Joel Scheuner,et al.  The State of Research on Function-as-a-Service Performance Evaluation: A Multivocal Literature Review , 2020, J. Syst. Softw..

[56]  Ahmed E. Hassan,et al.  Automatic detection of performance deviations in the load testing of Large Scale Systems , 2013, 2013 35th International Conference on Software Engineering (ICSE).