Performance regression testing of concurrent classes

Developers of thread-safe classes struggle with two opposing goals. The class must be correct, which requires synchronizing concurrent accesses, and the class should provide reasonable performance, which is difficult to realize in the presence of unnecessary synchronization. Validating the performance of a thread-safe class is challenging because it requires diverse workloads that use the class, because existing performance analysis techniques focus on individual bottleneck methods, and because reliably measuring the performance of concurrent executions is difficult. This paper presents SpeedGun, an automatic performance regression testing technique for thread-safe classes. The key idea is to generate multi-threaded performance tests and to compare two versions of a class with each other. The analysis notifies developers when changing a thread-safe class significantly influences the performance of clients of this class. An evaluation with 113 pairs of classes from popular Java projects shows that the analysis effectively identifies 13 performance differences, including performance regressions that the respective developers were not aware of.

[1]  Ralph E. Johnson,et al.  Bita: Coverage-guided, automatic testing of actor programs , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Shan Lu,et al.  Toddler: Detecting performance problems via similar memory-access patterns , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Jens Happe,et al.  Supporting swift reaction: Automatically uncovering performance problems by systematic experiments , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4]  Thomas R. Gross,et al.  Automatic testing of sequential and concurrent substitutability , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[5]  Gordon Fraser,et al.  Generating Unit Tests for Concurrent Classes , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[6]  Satish Narayanasamy,et al.  Maple: a coverage-driven testing tool for multithreaded programs , 2012, OOPSLA '12.

[7]  Guoqing Xu,et al.  Finding reusable data structures , 2012, OOPSLA '12.

[8]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[9]  Shin Hong,et al.  Testing concurrent programs to achieve high synchronization coverage , 2012, ISSTA 2012.

[10]  Michael Pradel,et al.  Fully automatic and precise detection of thread safety violations , 2012, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[11]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[12]  Chen Fu,et al.  Automatically finding performance problems with feedback-directed learning software testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[13]  Ahmed E. Hassan,et al.  A qualitative study on performance bugs , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[14]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[15]  Atanas Rountev,et al.  Uncovering performance problems in Java applications with reference propagation profiling , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Thomas R. Gross,et al.  Leveraging test generation and specification mining for automated bug detection without false positives , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[17]  Thomas R. Gross,et al.  Ballerina: Automatic generation and clustering of efficient random unit tests for multithreaded code , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18]  Matthew B. Dwyer,et al.  Automatic generation of load tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[19]  Eran Yahav,et al.  Testing atomicity of composed concurrent operations , 2011, OOPSLA '11.

[20]  Matthias Hauswirth,et al.  Catch me if you can: performance bug detection in the wild , 2011, OOPSLA '11.

[21]  Jeff Huang,et al.  Persuasive prediction of concurrency access anomalies , 2011, ISSTA '11.

[22]  Wenguang Chen,et al.  Do I use the wrong definition?: DeFuse: definition-use invariants for detecting concurrency and sequential bugs , 2010, OOPSLA.

[23]  Erik R. Altman,et al.  Performance analysis of idle programs , 2010, OOPSLA.

[24]  Ying Zou,et al.  Mining Performance Regression Testing Repositories for Automated Performance Analysis , 2010, 2010 10th International Conference on Quality Software.

[25]  Sebastian Burckhardt,et al.  Line-up: a complete and automatic linearizability checker , 2010, PLDI '10.

[26]  Shing-Chi Cheung,et al.  Detecting atomic-set serializability violations in multithreaded programs through active randomized testing , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[27]  Alessandro Orso,et al.  Automated Behavioral Regression Testing , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[28]  Pravesh Kothari,et al.  A randomized scheduler with probabilistic guarantees of finding bugs , 2010, ASPLOS XV.

[29]  Brandon Lucia,et al.  Finding concurrency bugs with context-aware communication graphs , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30]  Nikolai Tillmann,et al.  MSeqGen: object-oriented unit-test generation via mining source code , 2009, ESEC/SIGSOFT FSE.

[31]  Koushik Sen,et al.  A randomized dynamic program analysis technique for detecting real deadlocks , 2009, PLDI '09.

[32]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[33]  Koushik Sen,et al.  WISE: Automated test generation for worst-case complexity , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[34]  D. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[35]  Piramanayagam Arumuga Nainar,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[36]  Koushik Sen,et al.  Randomized active atomicity violation detection in concurrent programs , 2008, SIGSOFT '08/FSE-16.

[37]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[38]  Stephen N. Freund,et al.  Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs , 2008, PLDI '08.

[39]  Jong-Deok Choi,et al.  Perfdiff: a framework for performance difference analysis in a virtual machine environment , 2008, CGO '08.

[40]  Atif M. Memon,et al.  Automated gui testing guided by usage profiles , 2007, ASE.

[41]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[42]  Darko Marinov,et al.  Automated testing of refactoring engines , 2007, ESEC-FSE '07.

[43]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[44]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[45]  B. Natarajan,et al.  Main effects screening: a distributed continuous quality assurance process for monitoring performance degradation in evolving software systems , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[46]  David Notkin,et al.  Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution , 2005, TACAS.

[47]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[48]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[49]  Sarfraz Khurshid,et al.  Test input generation with java PathFinder , 2004, ISSTA '04.

[50]  Armin Biere,et al.  High‐level data races , 2003, Softw. Test. Verification Reliab..

[51]  Stephen McCamant,et al.  Predicting problems caused by component upgrades , 2003, ESEC/FSE-11.

[52]  Eitan Farchi,et al.  Multithreaded Java program test generation , 2001, JGI '01.

[53]  Kai Li,et al.  Performance measurements for multithreaded programs , 1998, SIGMETRICS '98/PERFORMANCE '98.

[54]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.

[55]  Elaine J. Weyuker,et al.  The Automatic Generation of Load Test Suites and the Assessment of the Resulting Software , 1995, IEEE Trans. Software Eng..

[56]  M. McKusick,et al.  gprof: a call graph execution profiler , 2004, SIGP.

[57]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[58]  M. Oriol,et al.  Artoo: Adaptive Random Testing for Object-oriented Software 1. Overview , 2022 .