Virtualizing HPC applications using modern hypervisors

In this paper we explore the prospects of virtualization technologies being applied to high performance computing tasks. We use an extensive set of HPC benchmarks to evaluate virtualization overhead, including HPC Challenge, NAS Parallel Benchmarks and SPEC MPI2007. We assess KVM and Palacios hypervisors and, with proper tuning of hypervisor, we reduce the performance degradation from 10-60% to 1-5% in many cases with processor cores count up to 240. At the same time, a few tests provide overhead ranging from 20% to 45% even with our enhancements. We describe the techniques necessary to achieve sufficient performance. These include host OS tuning to decrease noise level, using nested paging with large pages for efficient guest memory allocation, and proper NUMA architecture emulation when running virtual machines on NUMA hosts. Comparing KVM/QEMU and Palacios hypervisors, we conclude that in general the results with proper tuning are similar, with KVM providing more stable and predictable results while Palacios being much better on fine-grained tests at a large scale, but showing abnormal performance degradation on a few tests.

[1]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[2]  Geoffrey C. Fox,et al.  Analysis of Virtualization Technologies for High Performance Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[3]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.

[4]  Vishakha Gupta,et al.  High-Performance Hypervisor Architectures: Virtualization in HPC Systems , 2007 .

[5]  Nathan Regola,et al.  Recommendations for Virtualization Technologies in High Performance Computing , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[6]  Peter A. Dinda,et al.  Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[7]  Peter A. Dinda,et al.  Minimal-overhead virtualization of a large scale supercomputer , 2011, VEE '11.

[8]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[9]  Matthias S. Müller,et al.  SPEC MPI2007—an application benchmark suite for parallel systems using MPI , 2010, Concurr. Comput. Pract. Exp..

[10]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[11]  Jon Watson,et al.  VirtualBox: bits and bytes masquerading as machines , 2008 .

[12]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.