Architecture of Request Distributor for GPU Clusters

The advent of GPU computing has enabled development of many strategies for accelerating different kinds of simulations. Even further, instead of processing an application by just using one GPU, it is a common to use a collection of GPUs as a solution. These GPUs can be located in the same machine, network, or even across a wide area network. Unfortunately, distribution and management of GPUs requires additional efforts by the user such as deal with data transfer, connection and processing among GPUs. Request distributor for GPU clusters (RDGPUC) is a software architecture which allows companies, institutes and other users to share their GPU resources. By using this architecture, each cluster can have its own software to manage internal resources and they only need to develop small code to interact with RDGPUC. This novel design brings flexibility to the system and allows everyone to share their resources without need to change their GPU cluster tool. Another interesting part of system is to allow users to submit requests from all kind of devices and platforms. Admin of this system is able to specify resource groups and special schedules for using resources. On the other hand, end-users can just use a simple interface to submit their requests on RDGPUC without knowing about internal design and current status of GPU clusters.