DO-GPU: Domain Optimizable Soft GPUs

”Soft” GPUs are overlays that implement GPGPU-like data parallel processor architectures in FPGA logic to make FPGAs as software-programmable as ”hard” GPGPUs. Unlike hard GPUs, soft GPU architectures can be specialized to further improve efficiency by leveraging FPGA’s flexibility. Prior work has shown the software programmability potential for soft GPUs but only studied general-purpose soft GPUs with minor specializations (e.g., FPGU, FlexGrip, MIAOW, and SCRATCH) or only domain-optimized for a particular application domain (e.g., PDL-FGPU for the persistent deep learning domain.) This paper proposes a soft GPU development framework to automate the creation of soft GPU instances with aggressive application-domain optimizations (i.e., domain-optimized GPUs, or DOGPUs) that consists of a baseline general soft GPU architecture ”template” with an improved architecture over prior general purpose soft GPUs, along with a customizable partition that enables a custom datapath (macro unit) to be inserted to optimize for a target application domain. Unlike the prior PDL-FGPU which targets the persistent deep learning domain, the proposed framework can be used to target optimization for any application domain. Our evaluation on a set of data parallel workloads shows that (i) the proposed general soft GPU architecture offers average speedup of 1.8x versus the best prior soft GPUs we know of (i.e., FGPU, PDL-FGPU), (ii) DO-GPUs with domain-optimizations provide an average of 218x speedup over general soft GPUs, (iii) the proposed framework enabled building six new domain-optimized soft GPU instances in a matter of days, and (iv) enables quick GPU-like development effort (hours), where code is concise (low 100s of lines) and can be compiled in seconds without FPGA EDA tools in the loop, assuming an appropriate soft DO-GPU bitstream for the application domain is already built.