Catch me if you can: performance bug detection in the wild

Profilers help developers to find and fix performance problems. But do they find performance bugs -- performance problems that real users actually notice? In this paper we argue that -- especially in the case of interactive applications -- traditional profilers find irrelevant problems but fail to find relevant bugs. We then introduce lag hunting, an approach that identifies perceptible performance bugs by monitoring the behavior of applications deployed in the wild. The approach transparently produces a list of performance issues, and for each issue provides the developer with information that helps in finding the cause of the problem. We evaluate our approach with an experiment where we monitor an application used by 24 users for 1958 hours over the course of 3-months. We characterize the resulting 881 issues, and we find and fix the causes of a set of representative examples.

[1]  Chandra Krintz,et al.  Efficient remote profiling for resource-constrained devices , 2006, TACO.

[2]  Matthias Hauswirth,et al.  Measuring the performance of interactive applications with listener latency profiling , 2008, PPPJ '08.

[3]  B. Schneirdeman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[4]  Ben Shneiderman,et al.  Response time and display rate in human performance with computers , 1984, CSUR.

[5]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[6]  Andy Hopper,et al.  Virtual Network Computing , 1998, IEEE Internet Comput..

[7]  Matthias Hauswirth,et al.  Solve & evaluate with informa: a Java-based classroom response system for teaching Java , 2009, PPPJ '09.

[8]  Henry Hoffmann,et al.  Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  Matthias Hauswirth,et al.  Low-overhead memory leak detection using adaptive statistical profiling , 2004, ASPLOS XI.

[10]  J. Michael Spivey,et al.  Fast, accurate call graph profiling , 2004, Softw. Pract. Exp..

[11]  Ben Shneiderman,et al.  Determining Causes and Severity of End-User Frustration , 2004, Int. J. Hum. Comput. Interact..

[12]  Krisztián Flautner,et al.  Thread-level parallelism and interactive performance of desktop applications , 2000, SIGP.

[13]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[14]  Nickolai Zeldovich,et al.  Interactive Performance Measurement with VNCPlay , 2005, USENIX Annual Technical Conference, FREENIX Track.

[15]  Joseph J. LaViola,et al.  Code bubbles: rethinking the user interface paradigm of integrated development environments , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[16]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[17]  Matthias Hauswirth,et al.  LagAlyzer: A latency profile analysis and visualization tool , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[18]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[19]  Archana Ganapathi,et al.  Crash data collection: a Windows case study , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[20]  Zheng Wang,et al.  Using latency to evaluate interactive system performance , 1996, OSDI '96.

[21]  William G. Griswold,et al.  An Overview of AspectJ , 2001, ECOOP.

[22]  Matthias Hauswirth,et al.  Trevis: a context tree visualization & analysis framework and its use for classifying performance failure reports , 2010, SOFTVIS '10.

[23]  Ethan V. Munson,et al.  Is 100 Milliseconds Too Fast? , 2001, CHI Extended Abstracts.

[24]  Matthias Hauswirth,et al.  Evaluating the accuracy of Java profilers , 2010, PLDI '10.