Virtualization polling engine (VPE): using dedicated CPU cores to accelerate I/O virtualization

Virtual machine (VM) technologies are making rapid progress and VM performance is approaching that of native hardware in many aspects. Achieving high performance for I/O virtualization remains a challenge, however, especially for high speed networking devices such as 10 Gigabit Ethernet 10 GbE) NICs. Traditional software-based approaches to I/O virtualization usually suffer significant performance degradation compared with native hardware. Hardware-based approaches that allow direct device accessin VMs can achieve good performance, albeit at the expense of increased hardware cost and increased complexity in achieving tasks such as VM checkpointing, migration, and record/reply. Recently, the trend in microprocessor design has shifted from achieving higher CPU frequencies to putting more cores in a single chip, thus the cost of each core is rapidly decreasing. In this paper, we propose a new I/O virtualization approach called the Virtualization Polling Engine (VPE). VPE introduces a concept called virtualization onload, which takes advantage of dedicated CPU cores to help with the virtualization of I/O devices by using an event-driven execution model with dedicated polling threads. It can significantly reduce virtualization overhead and achieve performance close to the hardware-based approaches without requiring special hardware support. Using our VPE approach, we developed a prototype called KVM-VPE to provide Ethernet virtualization support for KVM. Our experiments in a 10GbE testbed showed that VPE significantly outperformed the original KVM. In Netperf TCP tests our prototype achieved over 5 times the bandwidth for transmitting (Tx) and over 3 times the bandwidth for receiving (Rx) compared with the original KVM. KVM-VPE also supports direct user application access to the virtual Ethernet interfaces and achieved 7.4 μs end-to-end latency between two VMs on different machines in our testbed. Overall, our research demonstrated that VPE is a promising approach to high performance I/O virtualization in the coming multicore era.

[1]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[2]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[3]  Alan L. Cox,et al.  Protection Strategies for Direct Access to Virtualized I/O Devices , 2008, USENIX Annual Technical Conference.

[4]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[5]  Alan L. Cox,et al.  Concurrent Direct Network Access for Virtual Machine Monitors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[6]  Andrew Warfield,et al.  Safe Hardware Access with the Xen Virtual Machine Monitor , 2007 .

[7]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[8]  Jamal Hadi Salim,et al.  Beyond Softnet , 2001, Annual Linux Showcase & Conference.

[9]  Derek McAuley,et al.  A case for virtual channel processors , 2003, NICELI '03.

[10]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[11]  L. Grossman Large Receive Offload implementation in Neterion 10GbE Ethernet driver , 2010 .

[12]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[13]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[14]  Vikram A. Saletore,et al.  Evaluating network processing efficiency with processor partitioning and asynchronous I/O , 2006, EuroSys.

[15]  Karsten Schwan,et al.  High performance and scalable I/O virtualization via self-virtualized devices , 2007, HPDC '07.

[16]  Jeff Hilland RDMA Protocol Verbs Specification , 2003 .

[17]  Norman P. Jouppi,et al.  High-performance ethernet-based communications for future multi-core processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[18]  P. Wyckoff,et al.  EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[19]  Marianne Shaw,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[20]  Tal Garfinkel,et al.  Virtual machine monitors: current technology and future trends , 2005, Computer.

[21]  Armin R. Mikler,et al.  NetPIPE: A Network Protocol Independent Performance Evaluator , 1996 .

[22]  Jose Renato Santos,et al.  Bridging the Gap between Software and Hardware Techniques for I/O Virtualization , 2008, USENIX Annual Technical Conference.

[23]  Alan L. Cox,et al.  Scheduling I/O in virtual machine monitors , 2008, VEE '08.

[24]  Dhabaleswar K. Panda,et al.  High Performance VMM-Bypass I/O in Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[25]  Alan L. Cox,et al.  Optimizing network virtualization in Xen , 2006 .

[26]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[27]  Keir Fraser,et al.  Arsenic: a user-accessible gigabit Ethernet interface , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[28]  Kieran Mansley Engineering a user-level TCP for the CLAN network , 2003, NICELI '03.

[29]  Liviu Iftode,et al.  TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance , 2002 .

[30]  D. Niehaus Hrtimers and Beyond : Transforming the Linux Time Subsystems , 2009 .

[31]  Ali G. Saidi,et al.  Integrated network interfaces for high-bandwidth TCP/IP , 2006, ASPLOS XII.

[32]  Muli Ben-Yehuda,et al.  The Price of Safety : Evaluating IOMMU Performance , 2007 .

[33]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[34]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[35]  Ivan B. Ganev,et al.  Re-architecting VMMs for Multicore Systems : The Sidecore Approach , 2007 .

[36]  Willy Zwaenepoel,et al.  Optimizing TCP Receive Performance , 2008, USENIX ATC.

[37]  Muli Ben-Yehuda,et al.  Loosely Coupled TCP Acceleration Architecture , 2006, 14th IEEE Symposium on High-Performance Interconnects (HOTI'06).