Modern HPC cluster virtualization using KVM and palacios

In this paper we explore the potential of virtualization being applied to High Performance Computing (HPC). We demonstrate the importance of proper NUMA architecture emulation when running HPC task inside virtual machines on multiple NUMA hosts. We assess KVM/QEMU and Palacios hypervisors and, with proper tuning of hypervisor (including NUMA emulation), we reduce the performance degradation from 10–60% to 1–5% on many tests from HPC Challenge and NAS Parallel Benchmark suites. All tests are performed on modern HPC cluster with high-speed Infiniband interconnect. The cluster nodes are 2-socket 12-core systems, up to 8 nodes were used for computation. Comparing KVM/QEMU and Palacios hypervisors, we conclude that in general the results with NUMA emulation enabled are similar, with KVM providing more stable and predictable results while Palacios being much better on fine-grained tests at a large scale, but showing abnormal performance degradation on a few tests. We believe that the main advantage of Palacios with respect to performance is the reduced amount of noise generated by the virtualization system. This advantage is getting more important when the scale of the system grows.

[1]  Khaled Z. Ibrahim,et al.  Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[2]  Nathan Regola,et al.  Recommendations for Virtualization Technologies in High Performance Computing , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[3]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.

[4]  Vishakha Gupta,et al.  High-Performance Hypervisor Architectures: Virtualization in HPC Systems , 2007 .

[5]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[6]  Peter A. Dinda,et al.  Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[7]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[8]  Peter A. Dinda,et al.  Minimal-overhead virtualization of a large scale supercomputer , 2011, VEE '11.

[9]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[10]  Geoffrey C. Fox,et al.  Analysis of Virtualization Technologies for High Performance Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[11]  Jon Watson,et al.  VirtualBox: bits and bytes masquerading as machines , 2008 .

[12]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[13]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.