Designing OS for HPC Applications: Scheduling

Operating systems have historically been implemented as independent layers between hardware and applications. User programs communicate with the OS through a set of well defined system calls, and do not have direct access to the hardware. The OS, in turn, communicates with the underlying architecture via control registers. Except for these interfaces, the three layers are practically oblivious to each other. While this structure improves portability and transparency, it may not deliver optimal performance. This is especially true for High Performance Computing (HPC) systems, where modern parallel applications and multi-core architectures pose new challenges in terms of performance, power consumption, and system utilization. The hardware, the OS, and the applications can no longer remain isolated, and instead should cooperate to deliver high performance with minimal power consumption. In this paper we present our experience with the design and implementation of High Performance Linux (HPL), an operating system designed to optimize the performance of HPC applications running on a state-of-the-art compute cluster. We show how characterizing parallel applications through hardware and software performance counters drives the design of the OS and how including knowledge about the architecture improves performance and efficiency. We perform experiments on a dual-socket IBM POWER6 machine, showing performance improvements and stability (performance variation of 2.11% on average) for NAS, a widely used parallel benchmark suite.

[1]  Dan Tsafrir,et al.  System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.

[2]  Emiliano Betti,et al.  A global operating system for HPC clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[3]  Francisco J. Cazorla,et al.  A dynamic scheduler for balancing HPC applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[5]  Paul Terry,et al.  Improving application performance on HPC systems with process synchronization , 2004 .

[6]  Francisco J. Cazorla,et al.  Balancing HPC applications through smart allocation of resources in MT processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[7]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[8]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[9]  Terry Jones,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[10]  R. V. D. Wijngaart NAS Parallel Benchmarks Version 2.4 , 2022 .

[11]  Susan Coghlan,et al.  The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.

[12]  Alejandro Duran,et al.  Automatic thread distribution for nested parallelism in OpenMP , 2005, ICS '05.

[13]  Dawson R. Engler,et al.  Exterminate all operating system abstractions , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[14]  Pradipta De,et al.  Handling OS jitter on multicore multithreaded systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Sameer Kumar,et al.  Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l , 2008, ICS '08.

[16]  Dawson R. Engler,et al.  The Operating System Kernel as a Secure Programmable Machine , 1994, ACM SIGOPS European Workshop.

[17]  T. Inglett,et al.  Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[18]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[19]  Susan Coghlan,et al.  Operating system issues for petascale systems , 2006, OPSR.

[20]  Susan Coghlan,et al.  Benchmarking the effects of operating system interference on extreme-scale parallel machines , 2008, Cluster Computing.

[21]  Ravi Kothari,et al.  Identifying sources of Operating System Jitter through fine-grained kernel instrumentation , 2007, 2007 IEEE International Conference on Cluster Computing.

[22]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[23]  Michael Ott,et al.  autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems , 2011, Trans. High Perform. Embed. Archit. Compil..

[24]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, HiPC 2008.

[25]  Scott Pakin,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.

[26]  R. Gioiosa,et al.  Analysis of system overhead on parallel computers , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[27]  Philip Heidelberger,et al.  Blue Gene/L advanced diagnostics environment , 2005, IBM J. Res. Dev..

[28]  Rolf Riesen,et al.  Designing and implementing lightweight kernels for capability computing , 2009 .

[29]  Allen D. Malony,et al.  The ghost in the machine: observing the effects of kernel operation on parallel application performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[30]  Dilma Da Silva,et al.  An infrastructure for multiprocessor run-time adaptation , 2002, WOSS '02.

[31]  Jochen Liedtke,et al.  Improving IPC by kernel design , 1994, SOSP '93.