Full-System Critical Path Analysis

Many interesting workloads today are limited not by CPU processing power but by the interactions between the CPU, memory system, I/O devices, and the complex software that ties all the components together. Optimizing these workloads requires identifying performance bottlenecks across concurrent hardware components and across multiple layers of software. Common software profiling techniques cannot account for hardware bottlenecks or situations where software overheads are hidden due to overlap with hardware operations. Critical-path analysis is a powerful approach for identifying bottlenecks in highly concurrent systems, but typically requires detailed domain knowledge to construct the required event dependence graphs. As a result, to date it has been applied only to isolated system layers (e.g., processor microarchitectures or message-passing applications). In this paper we present a novel technique for applying critical-path analysis to complex systems composed of numerous interacting state machines. We avoid tedious up-front modeling by using control-flow tracing to expose implicit software state machines automatically, and iterative refinement to add necessary manual annotations with minimal effort. By applying our technique within a full-system simulator, we achieve an integrated trace of hardware and software events with minimal perturbation. As a result, we can perform this analysis across the user/kernel and hardware/software boundaries and even across multiple systems. We apply this technique to analyzing network performance, and show that we are able to find performance bottlenecks in both hardware and software, including some surprising bottlenecks in the Linux 2.6.13 kernel.

[1]  Michel Dagenais,et al.  System Administration: The Linux Trace Toolkit , 2000 .

[2]  Wu-chun Feng,et al.  Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[4]  Xia Chen,et al.  Critical path analysis of the TRIPS architecture , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[5]  Paul Barford,et al.  Critical path analysis of TCP transactions , 2000, SIGCOMM.

[6]  Richard Mortier,et al.  Magpie: Online Modelling and Performance-aware Systems , 2003, HotOS.

[7]  Barton P. Miller,et al.  Critical path analysis for the execution of parallel and distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  Jeffrey K. Hollingsworth An online computation of critical path profiling , 1996, SPDT '96.

[9]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[10]  Shai Rubin,et al.  Focusing processor policies via critical-path prediction , 2001, ISCA 2001.

[11]  William E. Johnston,et al.  The NetLogger methodology for high performance distributed systems performance analysis , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[12]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[13]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[14]  Greg J. Regnier,et al.  TCP performance re-visited , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[15]  K. G. Lockyer An introduction to critical path analysis , 1965 .

[16]  D. Ford,et al.  Hidden in plain sight , 1992 .

[17]  Rastislav Bodík,et al.  Using Interaction Costs for Microarchitectural Bottleneck Analysis , 2003, MICRO.

[18]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.