Scaling Guest OS Critical Sections with eCS

Multi-core virtual machines (VMs) are now a norm in data center environments. However, one of the well-known problems that VMs suffer from is the vCPU scheduling problem that causes poor scalability behaviors. More specifically, the symptoms of this problem appear as preemption problems in both underand overcommitted scenarios. Although prior research efforts attempted to alleviate these symptoms separately, they fail to address the common root cause of these problems: the missing semantic gap that occurs when a guest OS is preempted while executing its own critical section, thereby leading to degradation of application scalability. In this work, we strive to address all preemption problems together by bridging the semantic gap between guest OSes and the hypervisor: the hypervisor now knows whether guest OSes are running in critical sections and a guest OS has hypervisor’s scheduling context. We annotate all critical sections by using the lightweight para-virtualized APIs, so we called enlightened critical sections (eCS), that provide scheduling hints to both the hypervisor and VMs. The hypervisor uses the hint to reschedule a vCPU to fundamentally overcome the double scheduling problem for these annotated critical sections and VMs use the hypervisor provided hints to further mitigate the blocked-waiter wake-up problem. Our evaluation results show that eCS guarantees the forward progress of a guest OS by 1) decreasing preemption counts by 85–100% while 2) improving the throughput of applications up to 2.5× in an over-committed scenario and 1.6× in an under-committed scenario for various real-world workloads on an 80-core machine.

[1]  Thomas Friebel,et al.  How to Deal with Lock Holder Preemption , 2008 .

[2]  Changwoo Min,et al.  Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds , 2016, OPSR.

[3]  Joshua LeVasseur,et al.  Towards Scalable Multiprocessor Virtual Machines , 2004, Virtual Machine Research and Technology Symposium.

[4]  John R. Lange,et al.  Preemptable ticket spinlocks: improving consolidated performance in the cloud , 2013, VEE '13.

[5]  Xiang Song,et al.  Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms , 2011 .

[6]  Haibo Chen,et al.  Schedule processes, not VCPUs , 2013, APSys.

[7]  Changwoo Min,et al.  Scalability in the Clouds!: A Myth or Reality? , 2015, APSys.

[8]  K. Gopinath,et al.  The RCU-Reader Preemption Problem in VMs , 2017, USENIX Annual Technical Conference.

[9]  Hwanju Kim,et al.  Demand-based coordinated scheduling for SMP VMs , 2013, ASPLOS '13.

[10]  Francis C. M. Lau,et al.  vScale: automatic and efficient processor scaling for SMP virtual machines , 2016, EuroSys.

[11]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[12]  Tzi-cker Chiueh,et al.  A Comprehensive Implementation and Evaluation of Direct Interrupt Delivery , 2015, VEE.

[13]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[14]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[15]  Xiaoning Ding,et al.  Gleaner: Mitigating the Blocked-Waiter Wakeup Problem for Virtualized Multicore Applications , 2014, USENIX Annual Technical Conference.

[16]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[17]  Changwoo Min,et al.  Understanding Manycore Scalability of File Systems , 2016, USENIX Annual Technical Conference.

[18]  Hyong S. Kim,et al.  Is co-scheduling too expensive for SMP VMs? , 2011, EuroSys '11.

[19]  Daniel Hagimont,et al.  The lock holder and the lock waiter pre-emption problems: nip them in the bud using informed spinlocks (I-Spinlock) , 2017, EuroSys.

[20]  Changwoo Min,et al.  Scalable NUMA-aware Blocking Synchronization Primitives , 2017, USENIX Annual Technical Conference.

[21]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .