Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds

With increasing demand for big-data processing and faster in-memory databases, cloud providers are moving towards large virtualized instances besides focusing on the horizontal scalability. However, our experiments reveal that such instances in popular cloud services (e.g., 32 vCPUs with 208 GB supported by Google Compute Engine) do not achieve the desired scalability with increasing core count even with a simple, embarrassingly parallel job (e.g., Linux kernel compile). On a serious note, the internal synchronization scheme (e.g., paravirtualized ticket spinlock) of the virtualized instance on a machine with higher core count (e.g., 80-core) dramatically degrades its overall performance. Our finding is different from the previously well-known scalability problem (i.e., lock contention problem) and occurs because of the sophisticated optimization techniques implemented in the hypervisor---what we call sleepy spinlock anomaly. To solve this problem, we design and implement OTICKET, a variant of paravirtualized ticket spinlock that effectively scales the virtualized instances in both undersubscribed and oversubscribed environments.

[1]  M. Frans Kaashoek,et al.  RadixVM: scalable address spaces for multithreaded applications , 2013, EuroSys '13.

[2]  Hyong S. Kim,et al.  Is co-scheduling too expensive for SMP VMs? , 2011, EuroSys '11.

[3]  Robert Morris,et al.  Non-scalable locks are dangerous , 2012 .

[4]  Joshua LeVasseur,et al.  Towards Scalable Multiprocessor Virtual Machines , 2004, Virtual Machine Research and Technology Symposium.

[5]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[6]  Haibo Chen,et al.  Schedule processes, not VCPUs , 2013, APSys.

[8]  Alex Landau,et al.  ELI: bare-metal performance for I/O virtualization , 2012, ASPLOS XVII.

[9]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[10]  vSphereTM Vmware® Vsphere™: the Cpu Scheduler in Vmware Esx® 4.1 Scalable Infrastructure with the Cpu Scheduler in Vmware Esx 4.1 , 2010 .

[11]  Thomas Friebel,et al.  How to Deal with Lock Holder Preemption , 2008 .

[12]  Austin T. Clements,et al.  The scalable commutativity rule: designing scalable software for multicore processors , 2013, SOSP.

[13]  Stephen Phillips,et al.  M7: Next generation SPARC , 2014, IEEE Hot Chips Symposium.

[14]  Hwanju Kim,et al.  Demand-based coordinated scheduling for SMP VMs , 2013, ASPLOS '13.

[15]  Xiang Song,et al.  Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms , 2011 .

[16]  Xiaoning Ding,et al.  Gleaner: Mitigating the Blocked-Waiter Wakeup Problem for Virtualized Multicore Applications , 2014, USENIX Annual Technical Conference.

[17]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[18]  John R. Lange,et al.  Preemptable ticket spinlocks: improving consolidated performance in the cloud , 2013, VEE '13.

[19]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[20]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[21]  K. T. Raghavendra,et al.  Paravirtualization for Scalable Kernel-Based Virtual Machine (KVM) , 2012, 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[22]  Jaehyuk Huh,et al.  Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Minglu Li,et al.  Dynamic adaptive scheduling for virtual machines , 2011, HPDC '11.