论文信息 - Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU Environment

Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU Environment

The demand for high performance has driven the technology to grow exponentially requiring the computer systems to work as effectively as possible for a valuable output. Substantial innovation in technology with time has made the use of GPUs working together with CPUs in order to make the system more efficient in performing computations optimally. This paper presents design of a scheduler for heterogeneous CUDA environment which ensures that while the task is fetched into the system, all the nodes participate fully in scheduling, thereby completing the task in less span of time as compared to normal schedulers resulting in efficient results. Tasks have been divided amongst the computing nodes according to the availability of the nodes giving maximum possible throughput of the system according to the workload. The scheduler has the potential of running the GPU code in parallel on different computing nodes within the High Performance Computing environment improving the overall performance of the applications. As a result, it turned out that the Smart scheduler gives better throughput in comparison to SLURM's existing schedulers which indicates that there was a room in SLURM's existing schedulers to increase the number of jobs within less span of time. The overall improvement in the throughput was observed to be up to 70 percent which is also shown in Figure 4.

[1] Dimitrios S. Nikolopoulos,et al. Programming Multiprocessors with Explicitly Managed Memory Hierarchies , 2009, Computer.

[2] Rolf Riesen,et al. Recent Trends in Operating Systems and their Applicability to HPC∗† , 2006 .

[3] Andy B. Yoo,et al. Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[4] Anant Agarwal,et al. Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[5] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[6] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers , 2009, IEEE Micro.

[7] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[8] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[9] Michael Lang,et al. Exploring Distributed Resource Allocation Techniques in the SLURM Job Management System , 2013 .

[10] Michael Lang,et al. Next generation job management systems for extreme-scale ensemble computing , 2014, HPDC '14.

[11] Anant Agarwal,et al. The Case for a Factored Operating System (fos) , 2008 .