An Exploratory Study of Performance Regression Introducing Code Changes

Performance is an important aspect of software quality. In fact, large software systems failures are often due to performance issues rather than functional bugs. One of the most important performance issues is performance regression. Examples of performance regressions are response time degradation and increased resource utilization. Although performance regressions are not all bugs, they often have a direct impact on users' experience of the system. Due to the possible large impact of performance regressions, prior research proposes various automated approaches that detect performance regressions. However, the detection of performance regressions is conducted after the fact, i.e., after the system is built and deployed in the field or dedicated performance testing environments. On the other hand, there exists rich software quality research that examines the impact of code changes on software quality; while a majority of prior findings do not use performance regression as a sign of software quality degradation. In this paper, we perform an exploratory study on the source code changes that introduce performance regressions. We conduct a statistically rigorous performance evaluation on 1,126 commits from ten releases of Hadoop and 135 commits from five releases of RxJava. In particular, we repetitively run tests and performance micro-benchmarks for each commit while measuring response time, CPU usage, Memory usage and I/O traffic. We identify performance regressions in each test or performance micro-benchmark if there exists statistically significant degradation with medium or large effect sizes, in any performance metric. We find that performance regressions widely exist during the development of both subject systems. By manually examining the issue reports that are associated with the identified performance regression introducing commits, we find that the majority of the performance regressions are introduced while fixing other bugs. In addition, we identify six root-causes of performance regressions. 12.5% of the examined performance regressions can be avoided or their impact may be reduced during development. Our findings highlight the need for performance assurance activities during development. Developers should address avoidable performance regressions and be aware of the impact of unavoidable performance regressions.

[1]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[2]  Ahmed E. Hassan,et al.  CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications , 2016, SIGSOFT FSE.

[3]  Qiang Fu,et al.  Identifying Recurrent and Unknown Performance Issues , 2014, 2014 IEEE International Conference on Data Mining.

[4]  Roozbeh Farahbod,et al.  Automated root cause isolation of performance regressions during software development , 2013, ICPE '13.

[5]  Xiao Ma,et al.  Performance regression testing target prioritization via performance risk analysis , 2014, ICSE.

[6]  Ahmed E. Hassan,et al.  Automated Verification of Load Tests Using Control Charts , 2011, 2011 18th Asia-Pacific Software Engineering Conference.

[7]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[8]  Ahmed E. Hassan,et al.  Automatic detection of performance deviations in the load testing of Large Scale Systems , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Abram Hindle,et al.  Green mining: a methodology of relating software change and configuration to power consumption , 2013, Empirical Software Engineering.

[10]  Gabriele Bavota,et al.  Recovering test-to-code traceability using slicing and textual analysis , 2014, J. Syst. Softw..

[11]  Calton Pu,et al.  vPerfGuard: an automated model-driven framework for application performance diagnosis in consolidated cloud environments , 2013, ICPE '13.

[12]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[13]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[14]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[15]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[16]  Gilbert Hamann,et al.  Automatic Comparison of Load Tests to Support the Performance Analysis of Large Enterprise Systems , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[17]  Ying Zou,et al.  Mining Performance Regression Testing Repositories for Automated Performance Analysis , 2010, 2010 10th International Conference on Quality Software.

[18]  Emerson R. Murphy-Hill,et al.  The design of bug fixes , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[19]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[20]  Abram Hindle,et al.  Energy Profiles of Java Collections Classes , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[21]  Cor-Paul Bezemer,et al.  Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[22]  B. Sinha,et al.  Statistical Meta-Analysis with Applications , 2008 .

[23]  Ahmed E. Hassan,et al.  A Survey on Load Testing of Large-Scale Software Systems , 2015, IEEE Transactions on Software Engineering.

[24]  Ahmed E. Hassan,et al.  Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters , 2015, ICPE.

[25]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[26]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[27]  M Alghmadi Hammam,et al.  An Automated Approach for Recommending When to Stop Performance Tests , 2016 .

[28]  Lee A. Becker,et al.  Effect Size (ES) , 2000 .

[29]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[30]  Cor-Paul Bezemer,et al.  Optimizing the Performance-Related Configurations of Object-Relational Mapping Frameworks Using a Multi-Objective Genetic Algorithm , 2016, ICPE.

[31]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[32]  Ahmed E. Hassan,et al.  Detecting performance anti-patterns for applications developed using object-relational mapping , 2014, ICSE.

[33]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[34]  Ahmed E. Hassan,et al.  Automated detection of performance regressions using statistical process control techniques , 2012, ICPE '12.

[35]  Ahmed E. Hassan,et al.  A qualitative study on performance bugs , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[36]  Elaine J. Weyuker,et al.  Experience with Performance Testing of Software Systems: Issues, an Approach, and Case Study , 2000, IEEE Trans. Software Eng..

[37]  Qi Luo,et al.  Mining Performance Regression Inducing Code Changes in Evolving Software , 2019 .

[38]  Thomas Reidemeister,et al.  Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[39]  Armando Fox,et al.  HiLighter: Automatically Building Robust Signatures of Performance Behavior for Small- and Large-Scale Systems , 2008, SysML.

[40]  Ahmed E. Hassan,et al.  An industrial case study of automatically identifying performance regression-causes , 2014, MSR 2014.

[41]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[42]  Ahmed E. Hassan,et al.  Continuous validation of performance test workloads , 2017, Automated Software Engineering.