Accelerating Compute-Intensive Applications with GPUs and FPGAs

Accelerators are special purpose processors designed to speed up compute-intensive sections of applications. Two extreme endpoints in the spectrum of possible accelerators are FPGAs and GPUs, which can often achieve better performance than CPUs on certain workloads. FPGAs are highly customizable, while GPUs provide massive parallel execution resources and high memory bandwidth. Applications typically exhibit vastly different performance characteristics depending on the accelerator. This is an inherent problem attributable to architectural design, middleware support and programming style of the target platform. For the best application-to-accelerator mapping, factors such as programmability, performance, programming cost and sources of overhead in the design flows must be all taken into consideration. In general, FPGAs provide the best expectation of performance, flexibility and low overhead, while GPUs tend to be easier to program and require less hardware resources. We present a performance study of three diverse applications - Gaussian elimination, data encryption standard (DES), and Needleman-Wunsch - on an FPGA, a GPU and a multicore CPU system. We perform a comparative study of application behavior on accelerators considering performance and code complexity. Based on our results, we present an application characteristic to accelerator platform mapping, which can aid developers in selecting an appropriate target architecture for their chosen application.

[1]  Joseph M. Lancaster,et al.  A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[2]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[3]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[4]  Fernando Gehm Moraes,et al.  From VHDL register transfer level to SystemC transaction level modeling: a comparative case study , 2003, 16th Symposium on Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings..

[5]  Emile A. Hendriks,et al.  FPGA accelerator for real-time skin segmentation , 2006, 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Time Multimedia.

[6]  Viktor K. Prasanna,et al.  A Hierarchical Performance Model for Reconfigurable Computers , 2007, Handbook of Parallel Computing.

[7]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[8]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[9]  Viktor K. Prasanna,et al.  Efficient hardware data mining with the Apriori algorithm on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[10]  Oskar Mencer,et al.  Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[11]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[12]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[13]  SkadronKevin,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008 .

[14]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[15]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[16]  Wayne Luk,et al.  Have GPUs made FPGAs redundant in the field of video processing? , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[17]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).