Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing

This paper provides the first comparison of performance and energy efficiency of high productivity computing systems based on FPGA (Field-Programmable Gate Array) and GPU (Graphics Processing Unit) technologies. The search for higher performance compute solutions has recently led to great interest in heterogeneous systems containing FPGA and GPU accelerators. While these accelerators can provide significant performance improvements, they can also require much more design effort than a pure software solution, reducing programmer productivity. The CUDA system has provided a high productivity approach for programming GPUs. This paper evaluates the High-Productivity Reconfigurable Computer (HPRC) approach to FPGA programming, where a commodity CPU instruction set architecture is augmented with instructions which execute on a specialised FPGA co-processor, allowing the CPU and FPGA to co-operate closely while providing a programming model similar to that of traditional software. To compare the GPU and FPGA approaches, we select a set of established benchmarks with different memory access characteristics, and compare their performance and energy efficiency on an FPGA-based Hybrid-Core system with a GPU-based system. Our results show that while GPUs excel at streaming applications, high-productivity reconfigurable computing systems outperform GPUs in applications with poor locality characteristics and low memory bandwidth requirements.

[1]  Jack Dongarra,et al.  Introduction to the HPCChallenge Benchmark Suite , 2004 .

[2]  Makoto Taiji,et al.  A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation , 2009, 2009 NASA/ESA Conference on Adaptive Hardware and Systems.

[3]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[4]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[5]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Mitsuo Yokokawa,et al.  16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7]  Marvin V. Zelkowitz,et al.  Measuring Productivity on High Performance Computers , 2005, IEEE METRICS.

[8]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[9]  Christos-Savvas Bouganis,et al.  GPU Versus FPGA for High Productivity Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[10]  Wayne Luk,et al.  Efficient reconfigurable design for pricing asian options , 2010, CARN.

[11]  Z. Weng,et al.  ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.

[12]  Satoshi Matsuoka,et al.  Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Scott Hauck,et al.  Impulse C vs. VHDL for Accelerating Tomographic Reconstruction , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[14]  Dah-Jye Lee,et al.  Real-Time Optical Flow Calculations on FPGA and GPU Architectures: A Comparison Study , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[15]  Jeff Mason,et al.  CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[16]  Wayne Luk,et al.  A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation , 2009, FPGA '09.

[17]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[18]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[19]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[20]  Tony M. Brewer,et al.  Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.