Deep learning has been successful at solving many kinds of tasks. Hardware accelerators with high performance and parallelism have become mainstream to implement deep neural networks. In order to increase hardware utilization, multiple applications will share the same compute resource. However, different applications may use different deep learning frameworks and occupy different amounts of resources. If there are no scheduling platforms that are compatible with different frameworks, resources competition will result in longer response time, run out of memory, and other errors. When the resources of the system cannot satisfy all the applications at the same time, application switching overhead will be excessive without reasonable resource management strategy. In this paper, we propose Furion - a middleware alleviates overheads for deep learning framework on a single machine. Furion schedules tasks, overlaps the execution of different computing resource, and batches unknown inputs to increase the hardware accelerator utilization. It dynamically manages memory usage for each application to alleviate the overhead of application switching and make a complex model enable implement in a low-end GPU. Our experiment proved that Furion achieves 2.2x-2.7x speedup on the GTX1060.
[1]
Jian Sun,et al.
Deep Residual Learning for Image Recognition
,
2015,
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2]
Xin Wang,et al.
Clipper: A Low-Latency Online Prediction Serving System
,
2016,
NSDI.
[3]
Huaping Chen,et al.
A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress
,
2017,
CODES+ISSS.
[4]
Quoc V. Le,et al.
Sequence to Sequence Learning with Neural Networks
,
2014,
NIPS.