Early Experiences with KTAU on the IBM BG/L

The influences of OS and system-specific effects on application performance are increasingly important in high performance computing. In this regard, OS kernel measurement is necessary to understand the interrelationship of system and application behavior. This can be viewed from two perspectives: kernel-wide and process-centric. An integrated methodology and framework to observe both views in HPC systems using OS kernel measurement has remained elusive. We demonstrate a new tool called KTAU (Kernel TAU) that aims to provide parallel kernel performance measurement from both perspectives. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU's measurement and analysis capabilities. As part of the ZeptoOS scalable operating systems project, we report early experiences using KTAU in ZeptoOS on the IBM BG/L system.

[1]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[2]  Barton P. Miller,et al.  CrossWalk: A Tool for Performance Profiling Across the User-Kernel Boundary , 2003, PARCO.

[3]  R.W. Wisniewski,et al.  Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[4]  TsafrirDan,et al.  Fine grained kernel logging with KLogger , 2007 .

[5]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[6]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[7]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  José E. Moreira,et al.  Blue Gene/L programming and operating environment , 2005, IBM J. Res. Dev..

[9]  Dan Tsafrir,et al.  Fine grained kernel logging with KLogger: experience and insights , 2007, EuroSys '07.

[10]  Michel Dagenais,et al.  Measuring and Characterizing System Behavior Using Kernel-Level Event Logging , 2000, USENIX Annual Technical Conference, General Track.

[11]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[12]  Vivek S. Pai,et al.  Proceedings of the General Track: 2004 Usenix Annual Technical Conference Making the " Box " Transparent: System Call Performance as a First-class Result , 2022 .

[13]  Barton P. Miller,et al.  Fine-grained dynamic instrumentation of commodity operating system kernels , 1999, OSDI '99.

[14]  Arthur B. Maccabe,et al.  A Framework for Analyzing Linux System Overheads on HPC Applications ∗ , 2005 .