Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx Virtex-4 field programmable gate array (FPGA). Two orders of magnitude speedup, over a general-purpose processor, is observed for each device for arithmetic intensive algorithms. An FPGA is superior, over a GPU, for algorithms requiring large numbers of regular memory accesses, while the GPU is superior for algorithms with variable data reuse. In the presence of data dependence, the implementation of a customized data path in an FPGA exceeds GPU performance by up to eight times. The trends of the analysis to newer and future technologies are analyzed.

[1]  Andrew Chi-Sing Leung,et al.  Discrete Wavelet Transform on Consumer-Level Graphics Hardware , 2007, IEEE Transactions on Multimedia.

[2]  Wayne Luk,et al.  Using Reconfigurable Logic to Optimise GPU Memory Accesses , 2008, 2008 Design, Automation and Test in Europe.

[3]  Oskar Mencer,et al.  Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[4]  Frank Vahid,et al.  A quantitative analysis of the speedup factors of FPGAs over processors , 2004, FPGA '04.

[5]  Peter Y. K. Cheung,et al.  Analysis of yield loss due to random photolithographic defects in the interconnect structure of FPGAs , 2005, FPGA '05.

[6]  Maya Gokhale,et al.  Matched Filter Computation on FPGA, Cell and GPU , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[7]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[8]  Peter Y. K. Cheung,et al.  Migrating functionality from ROMs to embedded multipliers , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[9]  Klaus Mueller,et al.  Why do commodity graphics hardware boards (GPUs) work so well for acceleration of computed tomography? , 2007, Electronic Imaging.

[10]  Eric J. Kelmelis,et al.  High-performance computing with desktop workstations , 2006 .

[11]  Matthew Aubury,et al.  Design Space Exploration of the European Option Benchmark using Hyperstreams , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[12]  Dinesh Manocha,et al.  General-Purpose Computations Using Graphics Processors , 2005, Computer.

[13]  Christos-Savvas Bouganis,et al.  A novel 2D filter design methodology for heterogeneous devices , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[14]  N.K. Govindaraju,et al.  A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15]  Victor Podlozhnyuk,et al.  Histogram calculation in CUDA, NVIDIA GPU Computing SDK CUDA Advanced Topics Whitepaper , 2007 .

[16]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Huifang Sun,et al.  Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms, and Standards , 1999 .

[18]  Wayne Luk,et al.  Bridging the Gap between FPGAs and Multi-Processor Architectures: A Video Processing Perspective , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[19]  Maria E. Angelopoulou,et al.  Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation Schedules on FPGAs , 2008, J. Signal Process. Syst..

[20]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[21]  Wayne Luk,et al.  Exploring Reconfigurable Architectures for Binomial-Tree Pricing Models , 2008, ARC.

[22]  Ron Sass,et al.  Quantifying Effective Memory Bandwidth of Platform FPGAs , 2007 .

[23]  Victor Podlozhnyuk,et al.  Image Convolution with CUDA , 2007 .

[24]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[25]  Wayne Luk,et al.  Have GPUs made FPGAs redundant in the field of video processing? , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[26]  Reed P. Tidwell Alpha Blending Two Data Streams Using a DSP 48 DDR Technique , 2000 .

[27]  Xinwei Xue,et al.  Acceleration of fluoro-CT reconstruction for a mobile C-arm on GPU and FPGA hardware: a simulation study , 2006, SPIE Medical Imaging.

[28]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[29]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[30]  Qiang Liu,et al.  Data Reuse Exploration for FPGA Based Platforms Applied to the Full Search Motion Estimation Algorithm , 2006, 2006 International Conference on Field Programmable Logic and Applications.