The Case for Partitioning Virtual Machines on Multicore Architectures

In this paper we argue that partitioning is required for attaining the best performance of scientific applications when running on virtual machines. Current memory management and I/O handling techniques introduce high overhead when running scientific applications. Using KVM, we quantify this impact on applications written in multiple paradigms: message passing, shared memory and partitioned global address spaces. Our analysis shows that on NUMA systems, current memory translation schemes cannot preserve the locality of access and introduce up to 82 percent slowdown. We discuss the interaction between contemporary OS and VM architectures and argue that partitioning is the best solution to enforce memory locality. Current I/O solutions using one assistant task cannot provide the level of I/O parallelism required by scientific applications and we observe an average 7.2 × application slowdown on a cluster with 16 cores per node. More specialized solutions that implement shared memory by-pass within the communication stack also do not scale well with cores and we observe an average 2.4 × application slowdown. Overall, our results indicate that using partitioning and direct inter-VM shared memory support is enough to provide close to native performance in multicore clusters.

[1]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[2]  Cong Xu,et al.  Performance Evaluation of Parallel Programming in Virtual Machine Environment , 2009, 2009 Sixth IFIP International Conference on Network and Parallel Computing.

[3]  David H. Bailey,et al.  The NAS Parallel Benchmarks 2.0 , 2015 .

[4]  Richard Wolski,et al.  Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software , 2009, Cluster Computing.

[5]  David E. Irwin,et al.  Virtual Machine Hosting for Networked Clusters: Building the Foundations for "Autonomic" Orchestration , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[6]  Kevin Klues,et al.  Tessellation: space-time partitioning in a manycore client OS , 2009 .

[7]  Jin-Soo Kim,et al.  Inter-domain socket communications supporting high performance and full binary compatibility on Xen , 2008, VEE '08.

[8]  P. E R F O R M A N C E S T U D Y A Performance Comparison of Hypervisors , 2007 .

[9]  Dhabaleswar K. Panda,et al.  Virtual machine aware communication libraries for high performance computing , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[10]  Irfan Habib,et al.  Virtualization with KVM , 2008 .

[11]  Peter A. Dinda,et al.  Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[13]  Monica S. Lam,et al.  Optimizing the migration of virtual computers , 2002, OPSR.

[14]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[15]  Raymond Namyst,et al.  Efficient Shared Memory Message Passing for Inter-VM Communications , 2008, Euro-Par Workshops.

[16]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[17]  Xiaolan Zhang,et al.  XenSocket: A High-Throughput Interdomain Transport for Virtual Machines , 2007, Middleware.

[18]  刘锋,et al.  Kernel-based virtual machine事件跟踪机制的设计与实现 , 2008 .

[19]  Adrian Schüpbach,et al.  Embracing diversity in the Barrelfish manycore operating system , 2008 .

[20]  Miguel Correia,et al.  Intrusion Tolerant Services Through Virtualization: A Shared Memory Approach , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[21]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[22]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[23]  Khaled Z. Ibrahim,et al.  Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[24]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[25]  Yingwei Luo,et al.  A Survey on I/O Virtualization and Optimization , 2010, 2010 Fifth Annual ChinaGrid Conference.

[26]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[27]  Chandra Krintz,et al.  Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[28]  Tal Garfinkel,et al.  Virtual machine contracts for datacenter and cloud computing environments , 2009, ACDC '09.