Accelerating GPU Computing at Runtime with Binary Optimization

Nowadays, many applications use GPUs (Graphics Processing Units) to achieve high performance. When we use GPU servers, the idle CPU resource of the servers is often ignored. In this paper, we explore the idea: using the idle CPU resource to speed up GPU programs. We design a dynamic binary optimization framework for accelerating GPU computing at runtime. A template-based binary optimization method is proposed to optimize kernels, which can avoid the high cost of kernel compilation. This method replaces determined variables with constant values and generates an optimized binary kernel. Based on the analysis results of optimization opportunities, we replace the original kernels with optimized kernels during program execution. The experimental results show that it is feasible to accelerate GPU programs via binary optimization. After applying binary optimization to five convolution layers of deep neural networks, the average performance improvement can reach 20%.

[1]  Lukas Stadler,et al.  Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation , 2017, VEE.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.