Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems

Heterogeneous multi-cores--platforms comprised of both general purpose and accelerator cores--are becoming increasingly common. While applications wish to freely utilize all cores present on such platforms, operating systems continue to view accelerators as specialized devices. The Pegasus system described in this paper uses an alternative approach that offers a uniform resource usage model for all cores on heterogeneous chip multiprocessors. Operating at the hypervisor level, its novel scheduling methods fairly and efficiently share accelerators across multiple virtual machines, thereby making accelerators into first class schedulable entities of choice for many-core applications. Using NVIDIA GPGPUs coupled with x86-based general purpose host cores, a Xen-based implementation of Pegasus demonstrates improved performance for applications by better managing combined platform resources. With moderate virtualization penalties, performance improvements range from 18% to 140% over base GPU driver scheduling when the GPUs are shared.

[1]  Sudhakar Yalamanchili,et al.  A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[3]  Mendel Rosenblum,et al.  Cellular disco: resource management using virtual clusters on shared-memory multiprocessors , 2000, TOCS.

[4]  Galen C. Hunt,et al.  Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.

[5]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[6]  Gregory Diamos,et al.  An Execution Model and Runtime for Heterogeneous Many Core Systems , 2011 .

[7]  Kevin Skadron,et al.  Enabling Task Parallelism in the CUDA Scheduler , 2009 .

[8]  Eyal de Lara,et al.  VMM-independent graphics acceleration , 2007, VEE '07.

[9]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[10]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[11]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[12]  Wolfgang Rehm,et al.  ACCFS - Operating System Integration of Computational Accelerators Using a VFS Approach , 2009, ARC.

[13]  Vishakha Gupta,et al.  Cellule: Lightweight Execution Environment for Accelerator-based Systems , 2010 .

[14]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[15]  Uday Bondhugula,et al.  Believe it or not! multi-core CPUs can match GPU performance for a FLOP-intensive application! , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[17]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[18]  David Chisnall,et al.  The Definitive Guide to the Xen Hypervisor , 2007 .

[19]  Vanish Talwar,et al.  vManage: loosely coupled platform and virtualization management in data centers , 2009, ICAC '09.

[20]  Suprio Ray,et al.  Cypress : A Scheduling Infrastructure for a Many-Core Hypervisor , 2008 .

[21]  Jeremy Sugerman,et al.  GPU virtualization on VMware's hosted I/O architecture , 2008, OPSR.

[22]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[24]  Chen-Yong Cher,et al.  A wire-speed powerTM processor: 2.3GHz 45nm SOI with 16 cores and 64 threads , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[25]  Karsten Schwan,et al.  High performance and scalable I/O virtualization via self-virtualized devices , 2007, HPDC '07.