Tableau: a high-throughput and predictable VM scheduler for high-density workloads

In the increasingly competitive public-cloud marketplace, improving the efficiency of data centers is a major concern. One way to improve efficiency is to consolidate as many VMs onto as few physical cores as possible, provided that performance expectations are not violated. However, as a prerequisite for increased VM densities, the hypervisor's VM scheduler must allocate processor time efficiently and in a timely fashion. As we show in this paper, contemporary VM schedulers leave substantial room for improvements in both regards when facing challenging high-VM-density workloads that frequently trigger the VM scheduler. As root causes, we identify (i) high runtime overheads and (ii) unpredictable scheduling heuristics. To better support high VM densities, we propose Tableau, a VM scheduler that guarantees a minimum processor share and a maximum bound on scheduling delay for every VM in the system. Tableau combines a low-overhead, core-local, table-driven dispatcher with a fast on-demand table-generation procedure (triggered on VM creation/teardown) that employs scheduling techniques typically used in hard real-time systems. In an evaluation of Tableau and three current Xen schedulers on a 16-core Intel Xeon machine, Tableau is shown to improve tail latency (e.g., a 17X reduction in maximum ping latency compared to Credit) and throughput (e.g., 1.6X peak web server throughput compared to RTDS when serving 1 KiB files with a 100 ms SLA).

[1]  Hai Jin,et al.  Synchronization-Aware Scheduling for Virtual Clusters in Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[2]  Björn Andersson,et al.  Preemption-light multiprocessor scheduling of sporadic tasks with high utilisation bound , 2009, 2009 30th IEEE Real-Time Systems Symposium.

[3]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[4]  Sung Y. Shin,et al.  DACS: dynamic allocation credit scheduler for virtual machines , 2017, SAC.

[5]  James H. Anderson,et al.  Adapting Pfair scheduling for symmetric multiprocessors , 2005, J. Embed. Comput..

[6]  Chenyang Lu,et al.  RT-Xen: Towards real-time hypervisor scheduling in Xen , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[7]  Amin Vahdat,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2007, 2009 3rd International Conference on New Technologies, Mobility and Security.

[8]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[9]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[10]  Mohsen Sharifi,et al.  cCluster: A Core Clustering Mechanism for Workload-Aware Virtual Machine Scheduling , 2015, 2015 3rd International Conference on Future Internet of Things and Cloud.

[11]  Hermann Kopetz,et al.  The time-triggered architecture , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[12]  Florian Schmidt,et al.  My VM is Lighter (and Safer) than your Container , 2017, SOSP.

[13]  Jon Crowcroft,et al.  Unikernels: library operating systems for the cloud , 2013, ASPLOS '13.

[14]  Don Marti,et al.  OSv - Optimizing the Operating System for Virtual Machines , 2014, USENIX Annual Technical Conference.

[15]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[16]  Kang G. Shin,et al.  Adaptive control of virtualized resources in utility computing environments , 2007, EuroSys '07.

[17]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[18]  Scott A. Brandt,et al.  OUTSTANDING PAPER: Optimal and Adaptive Multiprocessor Real-Time Scheduling: The Quasi-Partitioning Approach , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[19]  Xianghua Xu,et al.  Performance Evaluation of the CPU Scheduler in XEN , 2008, 2008 International Symposium on Information Science and Engineering.

[20]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[21]  Rui Wang,et al.  Optimizing Soft Real-Time Scheduling Performance for Virtual Machines with SRT-Xen , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[22]  Navjot Singh,et al.  Supporting soft real-time tasks in the xen hypervisor , 2010, VEE '10.

[23]  Costas Courcoubetis,et al.  Weighted Round-Robin Cell Multiplexing in a General-Purpose ATM Switch Chip , 1991, IEEE J. Sel. Areas Commun..

[24]  James H. Anderson,et al.  Optimal rate-based scheduling on multiprocessors , 2002, STOC '02.

[25]  Björn Andersson,et al.  Multiprocessor Scheduling with Few Preemptions , 2006, 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'06).

[26]  The CPU Scheduler in VMware vSphere ® 5 , 2013 .

[27]  Insup Lee,et al.  Realizing Compositional Scheduling through Virtualization , 2012, 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium.

[28]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[29]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[30]  Björn Andersson,et al.  Notional Processors: An Approach for Multiprocessor Scheduling , 2009, 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium.

[31]  Dongwan Shin,et al.  Virtual machine scheduling based on task characteristic , 2016, SAC.

[32]  Carlos Arango,et al.  Performance Evaluation of Container-based Virtualization for High Performance Computing Environments , 2017, Revista UIS Ingenierías.

[33]  Dan Feng,et al.  An Improved Xen Credit Scheduler for I/O Latency-Sensitive Applications on Multicores , 2013, 2013 International Conference on Cloud Computing and Big Data.

[34]  Alan Burns,et al.  Partitioned EDF scheduling for multiprocessors using a C=D task splitting scheme , 2011, Real-Time Systems.

[35]  Navjot Singh,et al.  XenTune: Detecting Xen Scheduling Bottlenecks for Media Applications , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[36]  Fabio Checconi,et al.  Providing Performance Guarantees to Virtual Machines Using Real-Time Scheduling , 2010, Euro-Par Workshops.

[37]  Scott A. Brandt,et al.  DP-FAIR: A Simple Model for Understanding Optimal Multiprocessor Scheduling , 2010, 2010 22nd Euromicro Conference on Real-Time Systems.

[38]  Hyong S. Kim,et al.  SageShift: Managing SLAs for highly consolidated cloud , 2012, 2012 Proceedings IEEE INFOCOM.

[39]  Miltos Petridis,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2009, 2009 3rd International Conference on New Technologies, Mobility and Security.

[40]  Bill McCarty,et al.  Selinux: NSA's Open Source Security Enhanced Linux , 2004 .

[41]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[42]  Hai Jin,et al.  Optimizing Xen Hypervisor by Using Lock-Aware Scheduling , 2012, 2012 Second International Conference on Cloud and Green Computing.

[43]  Vivien Quéma,et al.  The Linux scheduler: a decade of wasted cores , 2016, EuroSys.

[44]  Scott A. Brandt,et al.  RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[45]  Jon Crowcroft,et al.  Jitsu: Just-In-Time Summoning of Unikernels , 2015, NSDI.

[46]  Shinpei Kato,et al.  Semi-partitioned Scheduling of Sporadic Task Systems on Multiprocessors , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[47]  Kun Wang,et al.  Optimizing virtual machine scheduling in NUMA multicore systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[48]  Udo Steinberg,et al.  NOVA: a microhypervisor-based secure virtualization architecture , 2010, EuroSys '10.

[49]  Crispin Cowan,et al.  Linux security modules: general security support for the linux kernel , 2002, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[50]  Insup Lee,et al.  Real-time multi-core virtual machine scheduling in Xen , 2014, 2014 International Conference on Embedded Software (EMSOFT).

[51]  Aniruddha S. Gokhale,et al.  iTune: Engineering the Performance of Xen Hypervisor via Autonomous and Dynamic Scheduler Reconfiguration , 2018, IEEE Transactions on Services Computing.

[52]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[53]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[54]  Peter A. Dinda,et al.  VSched: Mixing Batch And Interactive Virtual Machines Using Periodic Real-time Scheduling , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[55]  Sanjoy K. Baruah,et al.  Proportionate progress: a notion of fairness in resource allocation , 1993, STOC '93.

[56]  Daniel Price,et al.  Solaris Zones: Operating System Support for Consolidating Commercial Workloads , 2004, LISA.

[57]  Erzhou Zhu,et al.  Performance Tuning Towards a KVM-based Embedded Real-Time Virtualization System , 2013, J. Inf. Sci. Eng..

[58]  Yu Chen,et al.  CFS Optimizations to KVM Threads on Multi-Core Environment , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[59]  P.J. Prisaznuk,et al.  ARINC 653 role in Integrated Modular Avionics (IMA) , 2008, 2008 IEEE/AIAA 27th Digital Avionics Systems Conference.

[60]  Björn B. Brandenburg,et al.  Global Scheduling Not Required: Simple, Near-Optimal Multiprocessor Real-Time Scheduling with Semi-Partitioned Reservations , 2016, 2016 IEEE Real-Time Systems Symposium (RTSS).

[61]  Björn Andersson,et al.  Scheduling Arbitrary-Deadline Sporadic Task Systems on Multiprocessors , 2008, 2008 Real-Time Systems Symposium.

[62]  Hai Jin,et al.  Adaptive audio-aware scheduling in Xen virtual environment , 2010, ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010.

[63]  Friedrich Eisenbrand,et al.  EDF-schedulability of synchronous periodic task systems is coNP-hard , 2010, SODA '10.

[64]  Xiaobo Zhou,et al.  Towards fair and efficient SMP virtual machine scheduling , 2014, PPoPP '14.

[65]  San Luis Obispo,et al.  IN PERFECT XEN, A PERFORMANCE STUDY OF THE EMERGING , 2013 .

[66]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[67]  Robert N. M. Watson,et al.  Jails: confining the omnipotent root , 2000 .

[68]  Larry L. Peterson,et al.  Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors , 2007, EuroSys '07.

[69]  Hai Jin,et al.  vProbe: Scheduling Virtual Machines on NUMA Systems , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[70]  Shinpei Kato,et al.  Portioned EDF-based scheduling on multiprocessors , 2008, EMSOFT '08.

[71]  Cong Xu,et al.  vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing , 2012, HPDC '12.

[72]  James H. Anderson,et al.  An EDF-based scheduling algorithm for multiprocessor soft real-time systems , 2005, 17th Euromicro Conference on Real-Time Systems (ECRTS'05).

[73]  Justine Sherry,et al.  Silo: Predictable Message Completion Time in the Cloud , 2013 .

[74]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[75]  Chandandeep Singh Pabla Completely fair scheduler , 2009 .