Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE

Virtual Machine (VM) technology is experiencing a resurgent interest as the ubiquitous multi-core processors have become the de facto configuration on modern web servers. Multicore servers potentially provide sufficient physical resources to realize VM's benefits including performance isolation, manageability and scalability. However, the network performance of virtualized multi-core servers falls short of expectation. It is therefore important to understand the overhead implications. In this paper, we evaluate the network performance of a virtualized multi-core server using a TCP streaming microbenchmark (Iperf) and SPECweb2005. We first motivate our research by presenting the performance gap between native and virtualized environment. We then break down the overhead from an architectural viewpoint and show that the cache topology greatly influences the performance. We also profile the Virtual Machine Monitor (VMM) at a function level to illustrate that functions in the current version of the Xen scheduler are the major contributors to the poor utilization of cache topology. Consequently, we implement a static onloading scheme to separate interrupt handling from application processes and execute them on cores with cache affinity. Based on the observed benefits, we modify the Xen scheduler to migrate virtual CPUs dynamically to exploit the cache topology. Our results show that the VM performance improves by an average of 12% for Iperf and 15% for SPECweb2005.

[1]  Donald Newell,et al.  Implications of cache asymmetry on server consolidation performance , 2008, 2008 IEEE International Symposium on Workload Characterization.

[2]  D. Abts,et al.  Design of Interconnection Networks , 2007 .

[3]  John A. Wiegert,et al.  Challenges for Scalable Networking in a Virtualized Server , 2007, 2007 16th International Conference on Computer Communications and Networks.

[4]  Bryan Veal,et al.  Performance scalability of a multi-core web server , 2007, ANCS '07.

[5]  Laxmi N. Bhuyan,et al.  Software techniques to improve virtualized I/O performance on multi-core systems , 2008, ANCS '08.

[6]  Alan L. Cox,et al.  Achieving 10 Gb/s using safe and transparent network interface virtualization , 2009, VEE '09.

[7]  Alan L. Cox,et al.  Concurrent Direct Network Access for Virtual Machine Monitors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  Xiao Zhang,et al.  Anatomy of UDP and M-VIA for cluster communication , 2005, J. Parallel Distributed Comput..

[9]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[10]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[11]  Mikko H. Lipasti,et al.  An architectural evaluation of Java TPC-W , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[12]  Vikram A. Saletore,et al.  ETA: experience with an Intel/spl reg/ Xeon/spl trade/ processor as a packet processing engine , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[13]  Ludmila Cherkasova,et al.  Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor , 2005, USENIX ATC, General Track.

[14]  Patrick Crowley,et al.  Network I/O Acceleration in Heterogeneous Multicore Processors , 2006, 14th IEEE Symposium on High-Performance Interconnects (HOTI'06).

[15]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[16]  Vikram A. Saletore,et al.  ETA: experience with an Intel Xeon processor as a packet processing engine , 2004, IEEE Micro.

[17]  Donald Newell,et al.  An in-depth analysis of the impact of processor affinity on network performance , 2004, Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).

[18]  Santosh G. Abraham,et al.  Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.

[19]  Wu-chun Feng,et al.  An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments , 2007 .

[20]  Alan L. Cox,et al.  Optimizing network virtualization in Xen , 2006 .

[21]  Dhabaleswar K. Panda,et al.  High Performance VMM-Bypass I/O in Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[22]  Alan L. Cox,et al.  Scheduling I/O in virtual machine monitors , 2008, VEE '08.