Software techniques to improve virtualized I/O performance on multi-core systems

Virtualization technology is now widely deployed on high performance networks such as 10-Gigabit Ethernet (10GE). It offers useful features like functional isolation, manageability and live migration. Unfortunately, the overhead of network I/O virtualization significantly degrades the performance of network-intensive applications. Two major factors of loss in I/O performance result from the extra driver domain to process I/O requests and the extra scheduler inside the virtual machine monitor (VMM) for scheduling domains. In this paper we first examine the negative effect of virtualization in multi-core platforms with 10GE networking. We study virtualization overhead and develop two optimizations for the VMM scheduler to improve I/O performance. The first solution uses cache-aware scheduling to reduce inter-domain communication cost. The second solution steals scheduler credits to favor I/O VCPUs in the driver domain. We also propose two optimizations to improve packet processing in the driver domain. First we re-design a simple bridge for more efficient switching of packets. Second we develop a patch to make transmit (TX) queue length in the driver domain configurable and adaptable to 10GE networks. Using all the above techniques, our experiments show that virtualized I/O bandwidth can be increased by 96%. Our optimizations also improve the efficiency by saving 36% in core utilization per gigabit. All the optimizations are based on pure software approaches and do not hinder live migration. We believe that the findings from our study will be useful to guide future VMM development.

[1]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[2]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[3]  Dhabaleswar K. Panda,et al.  High Performance VMM-Bypass I/O in Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[4]  Alan L. Cox,et al.  Scheduling I/O in virtual machine monitors , 2008, VEE '08.

[5]  Tal Garfinkel,et al.  Virtual machine monitors: current technology and future trends , 2005, Computer.

[6]  Dhabaleswar K. Panda,et al.  Performance characterization of a 10-Gigabit Ethernet TOE , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[7]  Wu-chun Feng,et al.  An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[8]  Wu-chun Feng,et al.  The Quadrics network (QsNet): high-performance clustering technology , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[9]  Alan L. Cox,et al.  Optimizing network virtualization in Xen , 2006 .

[10]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[11]  Robin Fairbairns,et al.  The Design and Implementation of an Operating System to Support Distributed Multimedia Applications , 1996, IEEE J. Sel. Areas Commun..

[12]  Andrew Warfield,et al.  Safe Hardware Access with the Xen Virtual Machine Monitor , 2007 .

[13]  Wu-chun Feng,et al.  An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments , 2007 .

[14]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[15]  Ram Huggahalli,et al.  Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).