Scalability in the Clouds!: A Myth or Reality?

With increasing demand of big-data processing and faster in-memory databases, cloud providers are gearing towards large virtualized instances rather than horizontal scalability. However, our experiments reveal that such instances in popular cloud services (e.g., 32 vCPUs with 208 GB supported by Google Compute Engine) do not achieve the desired scalability with increasing core count even with a simple, embarrassingly parallel job (e.g., kernel compile). On a serious note, the internal synchronization scheme (e.g., paravirtualized ticket spinlock) of the virtualized instance on a machine with higher core count (e.g., 80-core) dramatically degrades its overall performance. Our finding is different from a previously well-known scalability problem (lock contention problem), and occurs because of the sophisticated optimization techniques implemented in the hypervisor, what we call---sleepy spinlock anomaly. To solve this problem, we design and implement oticket, a variant of paravirtualized ticket spinlock that effectively scales the virtualized instances in both undersubscribed and oversubscribed environments.

[1]  Hyong S. Kim,et al.  Is co-scheduling too expensive for SMP VMs? , 2011, EuroSys '11.

[2]  Xiang Song,et al.  Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms , 2011 .

[3]  Thomas Friebel,et al.  How to Deal with Lock Holder Preemption , 2008 .

[4]  Robert Morris,et al.  Non-scalable locks are dangerous , 2012 .

[5]  Hwanju Kim,et al.  Demand-based coordinated scheduling for SMP VMs , 2013, ASPLOS '13.

[6]  Joshua LeVasseur,et al.  Towards Scalable Multiprocessor Virtual Machines , 2004, Virtual Machine Research and Technology Symposium.

[7]  Stephen Phillips,et al.  M7: Next generation SPARC , 2014, IEEE Hot Chips Symposium.

[8]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[9]  John R. Lange,et al.  Preemptable ticket spinlocks: improving consolidated performance in the cloud , 2013, VEE '13.

[10]  Xiaoning Ding,et al.  Gleaner: Mitigating the Blocked-Waiter Wakeup Problem for Virtualized Multicore Applications , 2014, USENIX Annual Technical Conference.

[11]  K. T. Raghavendra,et al.  Paravirtualization for Scalable Kernel-Based Virtual Machine (KVM) , 2012, 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[12]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[13]  vSphereTM Vmware® Vsphere™: the Cpu Scheduler in Vmware Esx® 4.1 Scalable Infrastructure with the Cpu Scheduler in Vmware Esx 4.1 , 2010 .

[14]  Minglu Li,et al.  Dynamic adaptive scheduling for virtual machines , 2011, HPDC '11.

[16]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[17]  Haibo Chen,et al.  Schedule processes, not VCPUs , 2013, APSys.

[18]  Jaehyuk Huh,et al.  Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.