Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK

The Petascale era has recently been ushered in and many researchers have already turned their attention to the challenges of exascale computing. To achieve petascale computing two broad approaches for kernels were taken, a lightweight approach embodied by IBM Blue Gene's CNK, and a more fullweight approach embodied by Cray's CNL. There are strengths and weaknesses to each approach. Examining the current generation can provide insight as to what mechanisms may be needed for the exascale generation. The contributions of this paper are the experiences we had with CNK on Blue Gene/P. We demonstrate it is possible to implement a small lightweight kernel that scales well but still provides a Linux environment and functionality desired by HPC programmers. Such an approach provides the values of reproducibility, low noise, high and stable performance, reliability, and ease of effectively exploiting unique hardware features. We describe the strengths and weaknesses of this approach.

[1]  Collin McCurdy,et al.  Early evaluation of IBM BlueGene/P , 2008, HiPC 2008.

[2]  Sameer Kumar,et al.  Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l , 2008, ICS '08.

[3]  Peter A. Dinda,et al.  Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[4]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Philip Heidelberger,et al.  The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.

[6]  T. Inglett,et al.  Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[9]  Rolf Riesen,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[10]  Philip Heidelberger,et al.  Blue Gene/L advanced diagnostics environment , 2005, IBM J. Res. Dev..

[11]  John A. Gunnels,et al.  Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).