Performance and Environment Monitoring for Whole-System Characterization and Optimization

As performance advances achievable through chip fabrication technology are reaching their limits, other areas of system design need to be explored. Several different possibilities exist. Our research in the context of the DARPA HPCS project PERCS [11] aims at an infrastructure to characterize and understand the interactions between hardware and software and to affect optimizations based on those characterizations. To achieve this, we have designed and implemented a performance and environment monitoring (PEM) infrastructure that vertically integrates performance events from various layers in the execution stack. The performance understanding achieved with PEM can be used to help tune application behavior on existing systems or potentially to improve future architecture designs by analyzing the PEM data collected on a whole system simulator while varying architecture characteristics. We have developed an architecture for continuous program optimization (CPO) to assist in, and automate the challenging task of performance tuning a system. CPO utilizes the data provided by PEM to detect, diagnose, and eliminate performance problems. We designed and implemented a PEM prototype that feeds the vertical event stream to a performance visualizer, our first PEM client. We describe the CPO architecture and how PEM interacts with CPO. We then present an experiment using the PEM visualization client to understand data gathered across multiple layers of the system, and show how that data was used to positively affect system performance. University of Colorado at Boulder IBM T. J. Watson Research Center. Work supported by Defense Advanced Research Project Agency Contract NBCH30390004 University of Toronto

[1]  Doug Kimelman,et al.  Strata-various: multi-layer visualization of dynamics in software system behavior , 1994, Proceedings Visualization '94.

[2]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[3]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[4]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[5]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Barton P. Miller,et al.  Fine-grained dynamic instrumentation of commodity operating system kernels , 1999, OSDI '99.

[7]  Daniel A. Reed,et al.  SvPablo: A multi-language architecture-independent performance analysis system , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[8]  Dilma Da Silva,et al.  An infrastructure for multiprocessor run-time adaptation , 2002, WOSS '02.

[9]  Alan L. Cox,et al.  Practical, transparent operating system support for superpages , 2002, OPSR.

[10]  Dilma Da Silva,et al.  System Support for Online Reconfiguration , 2003, USENIX Annual Technical Conference, General Track.

[11]  John Aycock,et al.  A brief history of just-in-time , 2003, CSUR.

[12]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[13]  R.W. Wisniewski,et al.  Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Barton P. Miller,et al.  CrossWalk: A Tool for Performance Profiling Across the User-Kernel Boundary , 2003, PARCO.

[15]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[16]  David A. Padua,et al.  A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[17]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[18]  Meng Dan,et al.  High Productivity Computing Systems , 2005 .