High Performance and Scalable Virtual Machine Storage I/O Stack for Multicore Systems

Today extending virtualization technology into high-performance, cluster platforms generates exciting new possibilities, including dynamic allocation of resources to job, easier to share resources between different jobs, easy checkpointing of jobs, and deployment of job-specific work environment. However, there still exists an I/O scalability problem in virtualization layer which may impede virtualization technology to be widely used in high-performance computing. Because we meet a sharp performance degradation when a virtual machine uses the multiqueue high performance non-volatile storage device as the secondary storage. Such a problem is caused by the current virtual block I/O layer which uses only one I/O thread to handle all I/O operations to a virtualized storage device. As the number of I/O intensive workloads increases, the rate of mutex contention of the I/O thread is accelerated because only one of them is allowed to run at any given instant. Therefore, it is the key problem that should be settled immediately so as to improve block I/O performance in virtualization. In this paper, we propose a novel design of high performance block I/O stack to solve this problem. The workloads will be free of the I/O contention inside the hypervisor by using the proposed method which uses multi-threaded I/O threads to handle all I/O operations to one storage device in parallel. Meanwhile, we use switch-less mechanisms to reduce the overhead caused by sending notification between a VM and its hypervisor; and improve I/O affinity by assigning a distinct dedicated core to each I/O thread in order to eliminate unnecessary scheduling. The prototype system is implemented on Linux 3.19 kernel and Quick Emulator (QEMU) 2.3.1. We deploy it to the POWER8 server for a detailed evaluation. The experimental results show that the proposed architecture scales graciously with multi-core environment. For example, test on 10-ways parallel I/O intensive workloads gets an 800\% increase than the single core implementation, indicating that the block I/O performance in a virtual machine is close to that of a bare metal system.

[1]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[2]  Muli Ben-Yehuda,et al.  SplitX: Split Guest/Hypervisor Execution on Multi-Core , 2011, WIOV.

[3]  Alex Landau,et al.  Towards exitless and efficient paravirtual I/O , 2012, SYSTOR '12.

[4]  Michael Gschwind,et al.  IBM POWER8 processor core microarchitecture , 2015, IBM J. Res. Dev..

[5]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[6]  Jimi Xenidis,et al.  Utilizing IOMMUs for Virtualization in Linux and Xen Muli , 2006 .

[7]  Jiuxing Liu,et al.  Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization , 2009, ICS.

[8]  Ivan B. Ganev,et al.  Re-architecting VMMs for Multicore Systems : The Sidecore Approach , 2007 .

[9]  Alex Landau,et al.  Efficient and Scalable Paravirtual I/O System , 2013, USENIX Annual Technical Conference.

[10]  Laxmi N. Bhuyan,et al.  Software techniques to improve virtualized I/O performance on multi-core systems , 2008, ANCS '08.

[11]  Hiroshi Tezuka,et al.  Pin-down cache: a virtual memory management technique for zero-copy communication , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[12]  신웅 OS I/O path optimizations for flash solid-state drives , 2017 .

[13]  Dutch T. Meyer,et al.  Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory , 2014, FAST 2014.

[14]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[15]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[16]  Guang R. Gao,et al.  Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling , 1996, International Symposium on Computer Architecture.

[17]  Muli Ben-Yehuda,et al.  The Price of Safety : Evaluating IOMMU Performance , 2007 .

[18]  Kai Bu,et al.  PCI Express-Based NVMe Solid State Disk , 2013 .

[19]  Chandra Krintz,et al.  Paravirtualization for HPC Systems , 2006, ISPA Workshops.

[20]  Muli Ben-Yehuda,et al.  Adding advanced storage controller functionality via low-overhead virtualization , 2012, FAST.

[21]  Gil Neiger,et al.  Intel ® Virtualization Technology for Directed I/O , 2006 .

[22]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[23]  Angelos Bilas Scaling I/O in virtualized multicore servers: how much I/O in 10 years and how to get there , 2012, VTDC '12.

[24]  Qinghua Gao,et al.  Performance Measuring and Comparing of Virtual Machine Monitors , 2008, 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing.

[25]  Parameswaran Ramanathan,et al.  HIP: hybrid interrupt-polling for the network interface , 2001, OPSR.

[26]  Muli Ben-Yehuda,et al.  The Turtles Project: Design and Implementation of Nested Virtualization , 2010, OSDI.