The ghost in the machine: observing the effects of kernel operation on parallel application performance

The performance of a parallel application on a scalable HPC system is determined by user-level execution of the application code are system-level (OS kernel) operations. To understand the influences of system-level factors on application performance, the measurement of OS kernel activities is key. We describe a technology to observe kernel actions and make this information available to application-level performance measurement tools. The benefits of merged application and OS performance information and its use in parallel performance analysis are demonstrated, both for profiling and tracing methodologies. In particular, we focus on the problem of kernel noise assessment as a stress test of the approach. We show new results for characterizing noise and introduce new techniques for evaluating noise interference and its effects on application execution. Our kernel measurement and noise analysis technologies are being developed as part of Linux OS environments for scalable parallel systems.

[1]  Allen D. Malony,et al.  Overhead Compensation in Performance Profiling , 2004, Parallel Process. Lett..

[2]  Susan Coghlan,et al.  Benchmarking the effects of operating system interference on extreme-scale parallel machines , 2008, Cluster Computing.

[3]  Allen D. Malony,et al.  Trace-Based Parallel Performance Overhead Compensation , 2005, HPCC.

[4]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[5]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[6]  Michel Dagenais,et al.  Measuring and Characterizing System Behavior Using Kernel-Level Event Logging , 2000, USENIX Annual Technical Conference, General Track.

[7]  Barton P. Miller,et al.  CrossWalk: A Tool for Performance Profiling Across the User-Kernel Boundary , 2003, PARCO.

[8]  David A. Bader,et al.  Performance analysis of parallel programs via message-passing graph traversal , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[9]  Allen D. Malony,et al.  ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.

[10]  Arthur B. Maccabe,et al.  A Framework for Analyzing Linux System Overheads on HPC Applications ∗ , 2005 .

[11]  Allen D. Malony,et al.  Integrated parallel performance views , 2007, Cluster Computing.

[12]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[13]  Vivek S. Pai,et al.  Proceedings of the General Track: 2004 Usenix Annual Technical Conference Making the " Box " Transparent: System Call Performance as a First-class Result , 2022 .

[14]  Barton P. Miller,et al.  Fine-grained dynamic instrumentation of commodity operating system kernels , 1999, OSDI '99.

[15]  Ronald Minnich,et al.  Analysis of microbenchmarks for performance tuning of clusters , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[16]  Allen D. Malony,et al.  Phase-Based Parallel Performance Profiling , 2005, PARCO.

[17]  BeckmanPete,et al.  Benchmarking the effects of operating system interference on extreme-scale parallel machines , 2008 .

[18]  Allen D. Malony,et al.  Kernel-Level Measurement for Integrated Parallel Performance Views: the KTAU Project , 2006, 2006 IEEE International Conference on Cluster Computing.

[19]  Ronald Mraz,et al.  Reducing the variance of point to point transfers in the IBM 9076 parallel computer , 1994, Proceedings of Supercomputing '94.

[20]  Wu-chun Feng,et al.  The MAGNeT Toolkit: Design, Implementation and Evaluation , 2002, The Journal of Supercomputing.