Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework

One of the major challenges in these days is "How can we implement up-to-date object detection algorithm in the heterogeneous system?" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.