Speeding up packet I/O in virtual machines

Most of the work on VM network performance has focused so far on bulk TCP traffic, which covers classical applications of virtualization. Completely new “paravirtualized devices” (Xenfront, VIRTIO, vmxnet) have been designed and implemented to improve network throughput. We expect virtualization to become widely used also for different workloads: packet switching devices and middleboxes, Software Defined Networks, etc.. These applications involve very high packet rates that are problematic not only for the hypervisor (which emulates network interfaces) but also for the host itself (which switches packets between guests and physical NICs). In this paper we provide three main results. First, we demonstrate how rates of millions of packets per second can be achieved even within VMs, with limited but targeted modifications on device drivers, hypervisors and the host's virtual switch. Secondly, we show that emulation of conventional NICs (e.g., Intel e1000) is perfectly capable of achieving such packet rates, without requiring completely different device models. Finally, we provide sets of modifications suitable for different use cases (acting only on the guest, or only on the host, or on both) which can improve the network throughput of a VM by 20 times or more. These results are important because they enable a new set of applications within virtual machines. In particular, we achieve guest-to-guest UDP speeds of over 1 Mpps with short frames (and 6 Gbit/s with 1500-byte frames) using a conventional e1000 device, and socket-based sender/receivers. This matches the speed of the OS on bare metal. Furthermore, we reach over 5 Mpps when guests use the netmap API. Our work requires only small changes to device drivers (about 100 lines, both for FreeBSD and Linux version of e1000), similarly small modifications to the hypervisor (we have a QEMU prototype available) and the use of the VALE switch as a network backend. Relevant changes are being incorporated and/or distributed as external patches for FreeBSD, QEMU and Linux.

[1]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[2]  Alex Garthwaite,et al.  The evolution of an x86 virtual machine monitor , 2010, OPSR.

[3]  Patrick Crowley,et al.  Performance Analysis of Packet Capture Methods in a 10 Gbps Virtualized Environment , 2012, 2012 21st International Conference on Computer Communications and Networks (ICCCN).

[4]  Jian Li,et al.  Performance Enhancement for Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive-Side Scaling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[5]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[6]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[7]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[8]  Jimi Xenidis,et al.  Utilizing IOMMUs for Virtualization in Linux and Xen Muli , 2006 .

[9]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[10]  Radu Rugina,et al.  Software Techniques for Avoiding Hardware Virtualization Exits , 2012, USENIX Annual Technical Conference.

[11]  David Chisnall,et al.  The Definitive Guide to the Xen Hypervisor , 2007 .

[12]  Jamal Hadi Salim,et al.  Beyond Softnet , 2001, Annual Linux Showcase & Conference.

[13]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[14]  Giuseppe Lettieri,et al.  VALE, a switched ethernet for virtual machines , 2012, CoNEXT '12.

[15]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[16]  Alex Landau,et al.  Towards exitless and efficient paravirtual I/O , 2012, SYSTOR '12.

[17]  Scott Devine,et al.  Bringing Virtualization to the x86 Architecture with the Original VMware Workstation , 2012, TOCS.