Performance regression testing target prioritization via performance risk analysis

As software evolves, problematic changes can significantly degrade software performance, i.e., introducing performance regression. Performance regression testing is an effective way to reveal such issues in early stages. Yet because of its high overhead, this activity is usually performed infrequently. Consequently, when performance regression issue is spotted at a certain point, multiple commits might have been merged since last testing. Developers have to spend extra time and efforts narrowing down which commit caused the problem. Existing efforts try to improve performance regression testing efficiency through test case reduction or prioritization. In this paper, we propose a new lightweight and white-box approach, performance risk analysis (PRA), to improve performance regression testing efficiency via testing target prioritization. The analysis statically evaluates a given source code commit's risk in introducing performance regression. Performance regression testing can leverage the analysis result to test commits with high risks first while delaying or skipping testing on low-risk commits. To validate this idea's feasibility, we conduct a study on 100 real-world performance regression issues from three widely used, open-source software. Guided by insights from the study, we design PRA and build a tool, PerfScope. Evaluation on the examined problematic commits shows our tool can successfully alarm 91% of them. Moreover, on 600 randomly picked new commits from six large-scale software, with our tool, developers just need to test only 14-22% of the 600 commits and will still be able to alert 87-95% of the commits with performance regression.

[1]  Robert C. Martin Agile Software Development, Principles, Patterns, and Practices , 2002 .

[2]  Shawn A. Bohner,et al.  Extending software change impact analysis into COTS components , 2002, 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings..

[3]  Mark Harman,et al.  Search Algorithms for Regression Test Case Prioritization , 2007, IEEE Transactions on Software Engineering.

[4]  Marin Litoiu,et al.  Model-based performance testing: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[5]  Reinhold Heckmann,et al.  Worst case execution time prediction by static program analysis , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Petr Tuma,et al.  Automated detection of performance regressions: the mono experience , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[7]  Ranjit Jhala,et al.  Finding latent performance bugs in systems implementations , 2010, FSE '10.

[8]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[9]  Matthias Hauswirth,et al.  Catch me if you can: performance bug detection in the wild , 2011, OOPSLA '11.

[10]  Chen Fu,et al.  Automatically finding performance problems with feedback-directed learning software testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[11]  Carlos Maltzahn,et al.  Using Comprehensive Analysis for Performance Debugging in Distributed Storage Systems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[12]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[13]  David S. Rosenblum,et al.  TESTTUBE: a system for selective regression testing , 1994, Proceedings of 16th International Conference on Software Engineering.

[14]  Rajiv Gupta,et al.  A methodology for controlling the size of a test suite , 1990, Proceedings. Conference on Software Maintenance 1990.

[15]  King Chun Foo AUTOMATED DISCOVERY OF PERFORMANCE REGRESSIONS IN ENTERPRISE APPLICATIONS , 2011 .

[16]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[17]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[18]  Gregg Rothermel,et al.  Test case prioritization: an empirical study , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[19]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[20]  Robert S. Arnold,et al.  Software Change Impact Analysis , 1996 .

[21]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[22]  Ying Zou,et al.  Mining Performance Regression Testing Repositories for Automated Performance Analysis , 2010, 2010 10th International Conference on Quality Software.

[23]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[24]  Laurie A. Williams,et al.  Empirical Software Change Impact Analysis using Singular Value Decomposition , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[25]  Giovanni Denaro,et al.  Early performance testing of distributed software applications , 2004, WOSP '04.

[26]  Alessandro Orso,et al.  Leveraging field data for impact analysis and regression testing , 2003, ESEC/FSE-11.

[27]  Emanuel Melachrinoudis,et al.  Bi-criteria models for all-uses test suite reduction , 2004, Proceedings. 26th International Conference on Software Engineering.

[28]  Adam A. Porter,et al.  A history-based test prioritization technique for regression testing in resource constrained environments , 2002, ICSE '02.

[29]  Gregory Fox Performance engineering as a part of the development life cycle for large-scale software systems , 1989, ICSE '89.

[30]  Malcolm Munro,et al.  An early impact analysis technique for software maintenance , 1994, J. Softw. Maintenance Res. Pract..

[31]  Atanas Rountev,et al.  Uncovering performance problems in Java applications with reference propagation profiling , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[32]  D.C. Schmidt,et al.  Reliable Effects Screening: A Distributed Continuous Quality Assurance Process for Monitoring Performance Degradation in Evolving Software Systems , 2007, IEEE Transactions on Software Engineering.

[33]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[34]  A. Orso,et al.  Efficient and precise dynamic impact analysis using execute-after sequences , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[35]  Ming Zhong,et al.  I/O system performance debugging using model-driven anomaly characterization , 2005, FAST'05.

[36]  Gregg Rothermel,et al.  An empirical comparison of dynamic impact analysis algorithms , 2004, Proceedings. 26th International Conference on Software Engineering.

[37]  Hong Mei,et al.  An experimental comparison of four test suite reduction techniques , 2006, ICSE.

[38]  Ian Molyneaux The Art of Application Performance Testing - Help for Programmers and Quality Assurance , 2009 .

[39]  Gregg Rothermel,et al.  A safe, efficient regression test selection technique , 1997, TSEM.

[40]  James M. Bieman Editorial: Is Anyone Listening? , 2005, Software Quality Journal.

[41]  Matthew B. Dwyer,et al.  Automatic generation of load tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[42]  Alessandro Orso,et al.  Efficient and precise dynamic impact analysis using execute-after sequences , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[43]  Shawn A. Bohner,et al.  Impact analysis-Towards a framework for comparison , 1993, 1993 Conference on Software Maintenance.

[44]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[45]  Gregg Rothermel,et al.  Whole program path-based dynamic impact analysis , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[46]  Elaine J. Weyuker,et al.  Experience with Performance Testing of Software Systems: Issues, an Approach, and Case Study , 2000, IEEE Trans. Software Eng..

[47]  Mithun Acharya,et al.  Practical change impact analysis based on static program slicing for industrial software systems , 2011, 2011 33rd International Conference on Software Engineering (ICSE).