Lock-Visor: An Efficient Transitory Co-scheduling for MP Guest

Multiprocessor (MP) virtual machines (VMs) are widely used in cloud environments. However, MP VMs suffer from lock holder preemption (LHP) issue. This causes a tremendous waste of CPU cycles, leading to deteriorated synchronization latency and a significant degradation in system performance. Previous works have addressed the problem with software co-scheduling or lock waiter yielding. However, co-scheduling suffers from CPU utility fragmentation, priority inversion and loss of the flexibility of hyper visor scheduler, which causes inefficiency in CPU usage. Lock waiter yielding, another solution, suffers from a large impact on hyper visor scheduler and issues with response latency. In this paper, we propose Lock-visor, an efficient transitory co-scheduling algorithm, to bypass the guest spin lock loop effectively. Our protocol has little to no impact on the flexibility of hyper visor scheduler, and achieves better system performance. Multiple policies are explored on top of transitory co-scheduling to maximize the efficiency of Lock-visor, i.e. instant transitory, selective instant transitory and deferred transitory co-scheduling. Comprehensive experiments are conducted using CPU-intensive, I/O-intensive and lock-intensive workloads. Our experimental results show that Lock-visor can significantly improve system performance (e.g. Lock-visor has up to 341.3% performance advantage over original Linux kernel 2.6.38 in Sys Bench 4-VM case), while at the same time improve system latency with little to no effect on scheduling fairness.

[1]  Tal Garfinkel,et al.  Virtual machine monitors: current technology and future trends , 2005, Computer.

[2]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[3]  Peter E. Strazdins,et al.  A comparison of local and gang scheduling on a Beowulf cluster , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[4]  Michael L. Scott,et al.  Scheduler-conscious synchronization , 1997, TOCS.

[5]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[6]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.

[7]  Koushik Chakraborty,et al.  Hardware support for spin management in overcommitted virtual machines , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Joshua LeVasseur,et al.  Towards Scalable Multiprocessor Virtual Machines , 2004, Virtual Machine Research and Technology Symposium.

[9]  Xin Li,et al.  Improving virtualization performance and scalability with advanced hardware accelerations , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[10]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[11]  Edward D. Lazowska,et al.  The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[12]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[13]  Hyong S. Kim,et al.  Is co-scheduling too expensive for SMP VMs? , 2011, EuroSys '11.

[14]  Dror G. Feitelson,et al.  Paired Gang Scheduling , 2003, IEEE Trans. Parallel Distributed Syst..

[15]  Robin Fairbairns,et al.  The Design and Implementation of an Operating System to Support Distributed Multimedia Applications , 1996, IEEE J. Sel. Areas Commun..

[16]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[17]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..

[18]  Marianne Shaw,et al.  Denali: Lightweight Virtual Machines for Distributed and Networked Applications , 2001 .

[19]  Michael L. Scott,et al.  High performance synchronization algorithms for multiprogrammed multiprocessors , 1995, PPOPP '95.

[20]  Maged M. Michael,et al.  Relative performance of preemption-safe locking and non-blocking synchronization on multiprogrammed shared memory multiprocessors , 1997, Proceedings 11th International Parallel Processing Symposium.

[21]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[22]  Yu Chen,et al.  CFS Optimizations to KVM Threads on Multi-Core Environment , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[23]  Victor Lee,et al.  Implications of I/O for Gang Scheduled Workloads , 1997, JSSPP.