Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms

Clouds have become attractive to applications, because of its low cost and on-demand computing model with the use of virtualization technologies. With the continual increasing number of cores per chip, it should be an emergence to study and improve the scalability of virtualized platforms. This paper tries to make a study on the horizontal scalability 1 of a set of parallel applications on virtualized platforms. By executing and profiling such software on a virtual machine configured with different number of cores on a commodity multi-core machine with 48 cores, we find several performance bottlenecks inside the Xen virtual machine monitor under different paging modes (e.g., direct paging mode and nested paging mode). After a detailed profiling and analysis, we propose several remedies with only less than 100 LOCs to avoid most of the bottlenecks, which result in a performance improvement ranging from 1.1X to 9.42X for a virtual machine configured with 32 cores 2 . The performance scalability is also notably improved. One speculative conclusion from this study is that, though there might be some scalability issues within current virtual machine monitors, some of them should be relatively easy to be refined for commodity multicore platforms.

[1]  Andrew Theurer,et al.  Virtual Scalability : Charting the Performance of Linux in a Virtual World Exploring the scalability of the Xen hypervisor , 2006 .

[2]  Michael Stumm,et al.  Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.

[3]  Willy Zwaenepoel,et al.  Performance Profiling in a Virtualized Environment , 2010, HotCloud.

[4]  Steven Hand,et al.  Satori: Enlightened Page Sharing , 2009, USENIX Annual Technical Conference.

[5]  Dilma Da Silva,et al.  Experience distributing objects in an SMMP OS , 2007, TOCS.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Vmware Esx Software and Hardware Techniques for x 86 Virtualization , 2006 .

[8]  Kang G. Shin,et al.  Performance Evaluation of Virtualization Technologies for Server Consolidation , 2007 .

[9]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Ludmila Cherkasova,et al.  XenMon: QoS Monitoring and Performance Profiling Tool , 2005 .

[11]  Kevin Klues,et al.  Tessellation: space-time partitioning in a manycore client OS , 2009 .

[12]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[13]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[14]  Gil Neiger,et al.  Intel ® Virtualization Technology for Directed I/O , 2006 .

[15]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[16]  Amin Vahdat,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2007, 2009 3rd International Conference on New Technologies, Mobility and Security.

[17]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[18]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[19]  Jose Renato Santos,et al.  Bridging the Gap between Software and Hardware Techniques for I/O Virtualization , 2008, USENIX Annual Technical Conference.

[20]  David Brumley,et al.  Virtual Appliances for Deploying and Maintaining Software , 2003, LISA.

[21]  Haibo Chen,et al.  A case for scaling applications to many-core with OS clustering , 2011, EuroSys '11.

[22]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[23]  Anand Sivasubramaniam,et al.  Xen and co.: communication-aware CPU scheduling for consolidated xen-based hosting platforms , 2007, VEE '07.

[24]  Coniferous softwood GENERAL TERMS , 2003 .

[25]  Xiaowei Yang,et al.  High performance network virtualization with SR-IOV , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[26]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[27]  Josh Aas Understanding the Linux 2.6.8.1 CPU Scheduler , 2005 .

[28]  Marianne Shaw,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[29]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[31]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[32]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[33]  Ludmila Cherkasova,et al.  Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor , 2005, USENIX ATC, General Track.

[34]  Jiuxing Liu Evaluating standard-based self-virtualizing devices: A performance study on 10 GbE NICs with SR-IOV support , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[35]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[36]  Srilatha Manne,et al.  Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.

[37]  Alan L. Cox,et al.  Achieving 10 Gb/s using safe and transparent network interface virtualization , 2009, VEE '09.

[38]  Garth R. Goodson,et al.  Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances , 2009, USENIX ATC.

[39]  George Varghese,et al.  Difference engine , 2010, OSDI.

[40]  Alan L. Cox,et al.  Concurrent Direct Network Access for Virtual Machine Monitors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[41]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.