GPU Scheduling for Short Tasks in Private Cloud

GPUs are usually very expensive and not easily affordable by individuals. Therefore, GPU sharing is necessary to lower cost and avoid GPU idling among a group of users. Unlike jobs in production environments, which often last for days or weeks, the running time of programs in development and testing environments tend to be much shorter. Assigning a separate GPU to a person for development always leads to idling of the GPU. Therefore, for economic reasons, researchers usually share a small number of GPUs for development, especially in some small teams or labs. Users hope to automatically lease and release GPUs and get job responses as soon as possible. Current GPU sharing approaches either do not have good support for multiple users, or not designed to work effectively for such cases. This paper proposes a GPU-sharing method among multiple users for short GPU tasks. We implement a container-based batch computing system, which accepts and executes users' jobs through container images and specified configurations. A shortest-job-first based scheduling policy is used to ensure the priority of the short tasks and to prevent long tasks from starving. Evaluation demonstrate that our proposed method is effective and the system has a low overhead.