Hierarchical Bucket Queuing for Fine‐Grained Priority Scheduling on the GPU

While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co‐processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine‐grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair‐scheduling, earliest‐deadline‐first scheduling and user‐defined dynamic priority scheduling. In a comparison with a sorting‐based approach, we reveal the advantages of hierarchical bucket queuing over previous work. Finally, we demonstrate the benefits of using priority scheduling in real‐world applications by example of path tracing and foveated micropolygon rendering.

[1]  Jürgen Teich,et al.  Dynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging , 2012, ARCS.

[2]  Don P. Mitchell,et al.  Generating antialiased images at low sampling densities , 1987, SIGGRAPH.

[3]  Michael F. P. O'Boyle,et al.  Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[4]  Jens Breitbart Static GPU Threads and an Improved Scan Algorithm , 2010, Euro-Par Workshops.

[5]  Timo Aila,et al.  Megakernels considered harmful: wavefront path tracing on GPUs , 2013, HPG '13.

[6]  Kyoung-Don Kang,et al.  Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[7]  David K. McAllister,et al.  OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..

[8]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[10]  Dieter Schmalstieg,et al.  Softshell , 2012, ACM Transactions on Graphics.

[11]  Mateo Valero,et al.  Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[12]  Anjul Patney,et al.  Fragment‐Parallel Composite and Filter , 2010, Comput. Graph. Forum.

[13]  Vivek Sarkar,et al.  Dynamic Task Parallelism with a GPU Work-Stealing Runtime System , 2011, LCPC.

[14]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[15]  Anjul Patney,et al.  Task management for irregular-parallel workloads on the GPU , 2010, HPG '10.

[16]  James H. Anderson,et al.  Globally scheduled real-time multiprocessor systems with GPUs , 2011, Real-Time Systems.

[17]  Scott A. Mahlke,et al.  Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.

[18]  Long Chen,et al.  Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  Alan Chalmers,et al.  Selective quality rendering by exploiting human inattentional blindness: looking but not seeing , 2002, VRST '02.

[21]  Vlastimil Havran,et al.  Parallel On-Demand Hierarchy Construction on Contemporary GPUs , 2016, IEEE Transactions on Visualization and Computer Graphics.

[22]  Shinpei Kato,et al.  RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[23]  Philippas Tsigas,et al.  On dynamic load balancing on graphics processors , 2008, GH '08.

[24]  Matthias Zwicker,et al.  Multidimensional adaptive sampling and reconstruction for ray tracing , 2008, ACM Trans. Graph..

[25]  Dieter Schmalstieg,et al.  Whippletree , 2014, ACM Trans. Graph..

[26]  Timo Aila,et al.  Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.

[27]  Robert L. Cook,et al.  The Reyes image rendering architecture , 1987, SIGGRAPH.

[28]  R. Ramamoorthi,et al.  Adaptive wavelet rendering , 2009, SIGGRAPH 2009.

[29]  Mohammad Abdullah Al Faruque,et al.  GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).