An Experience Report of Generating Load Tests Using Log-Recovered Workloads at Varying Granularities of User Behaviour

Designing field-representative load tests is an essential step for the quality assurance of large-scale systems. Practitioners may capture user behaviour at different levels of granularity. A coarse-grained load test may miss detailed user behaviour, leading to a non-representative load test; while an extremely fine-grained load test would simply replay user actions step by step, leading to load tests that are costly to develop, execute and maintain. Workload recovery is at core of these load tests. Prior research often captures the workload as the frequency of user actions. However, there exists much valuable information in the context and sequences of user actions. Such richer information would ensure that the load tests that leverage such workloads are more field-representative. In this experience paper, we study the use of different granularities of user behaviour, i.e., basic user actions, basic user actions with contextual information and user action sequences with contextual information, when recovering workloads for use in the load testing of large-scale systems. We propose three approaches that are based on the three granularities of user behaviour and evaluate our approaches on four subject systems, namely Apache James, OpenMRS, Google Borg, and an ultra-large-scale industrial system (SA) from Alibaba. Our results show that our approach that is based on user action sequences with contextual information outperforms the other two approaches and can generate more representative load tests with similar throughput and CPU usage to the original field workload (i.e., mostly statistically insignificant or with small/trivial effect sizes). Such representative load tests are generated only based on a small number of clusters of users, leading to a low cost of conducting/maintaining such tests. Finally, we demonstrate that our approaches can detect injected users in the original field workloads with high precision and recall. Our paper demonstrates the importance of user action sequences with contextual information in the workload recovery of large-scale systems.

[1]  Luís Moura Silva,et al.  Separating Performance Anomalies from Workload-Explained Failures in Streaming Servers , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[2]  Michele Colajanni,et al.  Design and Testing of Scalable Web-Based Systems with Performance Constraints , 2005, 2005 Workshop on Techniques, Methodologies and Tools for Performance Evaluation of Complex Systems (FIRB-PERF'05).

[3]  Patrick Martin,et al.  Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4]  Ahmed E. Hassan,et al.  Continuous validation of performance test workloads , 2017, Automated Software Engineering.

[5]  Kanchi Gopinath,et al.  Discovery of Application Workloads from Network File Traces , 2010, FAST.

[6]  Gilbert Hamann,et al.  Automatic identification of load testing problems , 2008, 2008 IEEE International Conference on Software Maintenance.

[7]  Ahmed E. Hassan,et al.  CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications , 2016, SIGSOFT FSE.

[8]  Gilbert Hamann,et al.  Automated performance analysis of load tests , 2009, 2009 IEEE International Conference on Software Maintenance.

[9]  Ivan Porres,et al.  Towards Automatic Performance and Scalability Testing of Rich Internet Applications in the Cloud , 2011, 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications.

[10]  Ahmed E. Hassan,et al.  Analytics-Driven Load Testing: An Industrial Experience Report on Load Testing of Large-Scale Systems , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[11]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[12]  Wilhelm Hasselbring,et al.  WESSBAS: extraction of probabilistic workload specifications for load testing and performance prediction—a model-driven approach for session-based application systems , 2016, Software & Systems Modeling.

[13]  Said Elnaffar,et al.  Characterizing Computer Systems' Workloads , 2002 .

[14]  David Hung-Chang Du,et al.  On the Accuracy and Scalability of Intensive I/O Workload Replay , 2017, FAST.

[15]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[16]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[17]  Tim Brecht,et al.  Characterizing the workload of a netflix streaming video server , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[18]  Steven M. Drucker,et al.  The Bones of the System: A Case Study of Logging and Telemetry at Microsoft , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[19]  Kesheng Wu,et al.  Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[20]  Gilbert Hamann,et al.  An automated approach for abstracting execution logs to execution events , 2008, J. Softw. Maintenance Res. Pract..

[21]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[22]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[23]  Marin Litoiu,et al.  A Framework to Evaluate the Effectiveness of Different Load Testing Analysis Techniques , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[24]  Evgenia Smirni,et al.  Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[25]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Maike K. Aurich,et al.  Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models , 2012 .

[27]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[28]  Gang Lu,et al.  Characterization of real workloads of web search engines , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[29]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[30]  Samuel Kounev,et al.  Automated Workload Characterization for I/O Performance Analysis in Virtualized Environments , 2015, Software Engineering.

[31]  Cor-Paul Bezemer,et al.  Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[32]  Jerome A. Rolia,et al.  A Synthetic Workload Generation Technique for Stress Testing Session-Based Systems , 2006, IEEE Transactions on Software Engineering.

[33]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[34]  Ahmed E. Hassan,et al.  Finding and Evaluating the Performance Impact of Redundant Data Access for Applications that are Developed Using Object-Relational Mapping Frameworks , 2016, IEEE Transactions on Software Engineering.

[35]  Chanchal Kumar Roy,et al.  Evaluating clone detection tools with BigCloneBench , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[36]  Lee A. Becker,et al.  Effect Size (ES) , 2000 .

[37]  Chanchal Kumar Roy,et al.  A mutation analysis based benchmarking framework for clone detectors , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[38]  A. Hassan,et al.  An Industrial Case Study of Customizing Operational Profiles Using Log Compression , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[39]  Dongmei Zhang,et al.  Identifying impactful service system problems via log analysis , 2018, ESEC/SIGSOFT FSE.

[40]  Yves Le Traon,et al.  Stress Testing of Transactional Database Systems , 2013, J. Inf. Data Manag..

[41]  Jianfeng Zhan,et al.  LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems , 2010, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[42]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[43]  Jongmoo Choi,et al.  IO Workload Characterization Revisited: A Data-Mining Approach , 2014, IEEE Transactions on Computers.

[44]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[45]  Elaine J. Weyuker,et al.  Experience with Performance Testing of Software Systems: Issues, an Approach, and Case Study , 2000, IEEE Trans. Software Eng..

[46]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[47]  Amit Banerjee,et al.  Validating clusters using the Hopkins statistic , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[48]  Ahmed E. Hassan,et al.  Continuous validation of load test suites , 2014, ICPE.

[49]  Ahmed E. Hassan,et al.  Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters , 2015, ICPE.

[50]  N. Nachar The Mann ‐ Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution , 2007 .

[51]  Mariacarla Calzarossa,et al.  Workload Characterization , 2016, ACM Comput. Surv..