论文信息 - Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2× compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3×. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.

[1] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[2] Egil Fykse. Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems , 2013 .

[3] Jason Cong,et al. Understanding Performance Differences of FPGAs and GPUs , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[4] Kevin Skadron,et al. Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[5] Greg Brown,et al. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.

[6] Sagheer Ahmad,et al. UltraScale+ MPSoC and FPGA families , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[7] Mark Horowitz,et al. Scaling, Power and the Future of CMOS , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[8] Norbert Wehn,et al. A quantitative cross-architecture study of morphological image processing on CPUs, GPUs, and FPGAs , 2015, 2015 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE).

[9] Greg Brown,et al. A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications , 2015, TRETS.

[10] K. Bernstein,et al. Scaling, power, and the future of CMOS , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[11] David Gregg,et al. The Movidius Myriad Architecture's Potential for Scientific Computing , 2015, IEEE Micro.

[12] Arnaud Tisserand,et al. Power Consumption of GPUs from a Software Perspective , 2009, ICCS.

[13] Constantine Bekas,et al. Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).