Optimizing Goodput of Real-time Serverless Functions using Dynamic Slicing with vGPUs

As the popularity and relevance of the Function-as-a-Service (FaaS) model keeps growing, we believe newer avatars of the service will support computationally intensive SIMT functions that will execute on GPUs. With hardware-assisted virtualization of GPUs now possible, cloud offerings including GPUs usually bind a virtual GPU (vGPU) to a VM. While there is a choice of scheduling algorithms to multiplex vGPUs on to the physical GPU, the work-conserving best-effort scheduler helps to maintain a high level of utilization of the GPU. With this, we observe that the total share of the GPU per VM is non-deterministic and depends on how different VMs load the GPU via their vGPUs. As a result, any function-to-vGPU scheduler that does not explicitly account for this nondeterministic vGPU capacity will suffer from lower than optimal goodput - particularly when these functions are deadline bound as is the case with FaaS offerings today. In this work, we exploit a software based task slicing technique to dynamically determine task sizes for scheduling on vGPUs to maximize successful completion of functions within their deadlines. Our solution extends the conventional earliest-deadline first (EDF) scheduling algorithm by balancing scheduling opportunities (via kernel slicing) and maximizing the chances of functions finishing before their deadline. The work is motivated by the fact that static decisions that consider entire tasks as scheduling units or use a fixed, statically decided slice size as scheduling units cannot adapt to the non-deterministic vGPU capacity. A comparison of our solution with well-known deadline aware scheduling approach (earliest deadline first), yielded an improvement of up to 2.9x in goodput.