Gleaner: Mitigating the Blocked-Waiter Wakeup Problem for Virtualized Multicore Applications

As the number of cores in a multicore node increases in accordance with Moore's law, the question arises as to what are the costs of virtualized environments when scaling applications to take advantage of larger core counts. While a widely-known cost due to preempted spinlock holders has been extensively studied, this paper studies another cost, which has received little attention. The cost is caused by the intervention from the VMM during synchronization-induced idling in the application, guest OS, or supporting libraries--we call this the blocked-waiter wakeup (BWW) problem. The paper systematically analyzes the cause of the BWW problem and studies its performance issues, including increased execution times, reduced system throughput, and performance unpredictability. To deal with these issues, the paper proposes a solution, Gleaner, which integrates idling operations and imbalanced scheduling as a mitigation to this problem. We show how Gleaner can be implemented without intrusive modification to the guest OS. Extensive experiments show that Gleaner can effectively reduce the virtualization cost incurred by blocking synchronization and improve the performance of individual applications by 16x and system throughput by 3x.

[1]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[2]  Haibo Chen,et al.  Schedule processes, not VCPUs , 2013, APSys.

[3]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[4]  Xiang Song,et al.  Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms , 2011 .

[5]  Muli Ben-Yehuda,et al.  SplitX: Split Guest/Hypervisor Execution on Multi-Core , 2011, WIOV.

[6]  John R. Lange,et al.  Preemptable ticket spinlocks: improving consolidated performance in the cloud , 2013, VEE '13.

[7]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[8]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[9]  Jack J. Dongarra,et al.  Evaluation of the HPC Challenge Benchmarks in Virtualized Environments , 2011, Euro-Par Workshops.

[10]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[11]  Xiaobo Zhou,et al.  Towards fair and efficient SMP virtual machine scheduling , 2014, PPoPP '14.

[12]  Hwanju Kim,et al.  Demand-based coordinated scheduling for SMP VMs , 2013, ASPLOS '13.

[13]  Ravi Iyer,et al.  Modeling virtual machine performance: challenges and approaches , 2010, PERV.

[14]  Hyong S. Kim,et al.  Is co-scheduling too expensive for SMP VMs? , 2011, EuroSys '11.

[15]  Leon Atkins,et al.  Algorithms for power savings , 2014 .

[16]  Peter A. Dinda,et al.  Minimal-overhead virtualization of a large scale supercomputer , 2011, VEE '11.

[17]  Xiaoning Ding,et al.  A Hidden Cost of Virtualization When Scaling Multicore Applications , 2013, HotCloud.

[18]  Thomas Friebel,et al.  How to Deal with Lock Holder Preemption , 2008 .

[19]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[20]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[21]  Koushik Chakraborty,et al.  Hardware support for spin management in overcommitted virtual machines , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Jaehyuk Huh,et al.  The Effect of Multi-core on HPC Applications in Virtualized Systems , 2010, Euro-Par Workshops.

[23]  Joshua LeVasseur,et al.  Towards Scalable Multiprocessor Virtual Machines , 2004, Virtual Machine Research and Technology Symposium.