PerfGuard: binary-centric application performance monitoring in production environments

Diagnosis of performance problems is an essential part of software development and maintenance. This is in particular a challenging problem to be solved in the production environment where only program binaries are available with limited or zero knowledge of the source code. This problem is compounded by the integration with a significant number of third-party software in most large-scale applications. Existing approaches either require source code to embed manually constructed logic to identify performance problems or support a limited scope of applications with prior manual analysis. This paper proposes an automated approach to analyze application binaries and instrument the binary code transparently to inject and apply performance assertions on application transactions. Our evaluation with a set of large-scale application binaries without access to source code discovered 10 publicly known real world performance bugs automatically and shows that PerfGuard introduces very low overhead (less than 3% on Apache and MySQL server) to production systems.

[1]  Brad A. Myers,et al.  The importance of percent-done progress indicators for computer-human interfaces , 1985, CHI '85.

[2]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[3]  Xiangyu Zhang,et al.  IntroPerf: transparent context-sensitive multi-layer performance inference using system stack traces , 2014, SIGMETRICS '14.

[4]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  Dongmei Zhang,et al.  Context-sensitive delta inference for identifying workload-dependent performance bottlenecks , 2013, ISSTA.

[6]  Chen Fu,et al.  Automatically finding performance problems with feedback-directed learning software testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[7]  Saurabh Sinha,et al.  Robust test automation using contextual clues , 2014, ISSTA 2014.

[8]  Saurabh Sinha,et al.  Automating test automation , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[9]  Yang Liu,et al.  Generating Performance Distributions via Probabilistic Symbolic Execution , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[10]  Qiang Fu,et al.  Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis , 2015, USENIX Annual Technical Conference.

[11]  Прикладное программное обеспечение Windows Error Reporting , 2012 .

[12]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[13]  Dirk Grunwald,et al.  LoopProf : Dynamic Techniques for Loop Detection and Profiling , 2022 .

[14]  Ratul Mahajan,et al.  AppInsight: Mobile App Performance Monitoring in the Wild , 2022 .

[15]  F. Pukelsheim The Three Sigma Rule , 1994 .

[16]  Shan Lu,et al.  CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[17]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[18]  Zhendong Su,et al.  Profile-guided program simplification for effective testing and analysis , 2008, SIGSOFT '08/FSE-16.

[19]  Shan Lu,et al.  Toddler: Detecting performance problems via similar memory-access patterns , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Koushik Sen,et al.  WISE: Automated test generation for worst-case complexity , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[21]  Yepang Liu,et al.  Characterizing and detecting performance bugs for smartphone applications , 2014, ICSE.

[22]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[23]  Xiangyu Zhang,et al.  BISTRO: Binary Component Extraction and Embedding for Software Security Applications , 2013, ESORICS.

[24]  Xiaohui Gu,et al.  PerfScope: Practical Online Server Performance Bug Inference in Production Cloud Computing Infrastructures , 2014, SoCC.

[25]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[26]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[27]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[28]  M. Desnoyers,et al.  The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux , 2006 .

[29]  John C. Platt,et al.  Finding Similar Failures Using Callstack Similarity , 2008, SysML.

[30]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[31]  Qi Luo,et al.  Automating performance bottleneck detection using search-based application profiling , 2015, ISSTA.

[32]  Matthew B. Dwyer,et al.  Automatic generation of load tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[33]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .

[34]  Robert B. Miller,et al.  Response time in man-computer conversational transactions , 1899, AFIPS Fall Joint Computing Conference.