Trends of CPU, GPU and FPGA for high-performance computing

Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems.

[1]  Achieving One TeraFLOPS with 28-nm FPGAs , 2010 .

[2]  John D. Davis,et al.  BLAS Comparison on FPGA, CPU and GPU , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[3]  Alan D. George,et al.  Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing , 2011, Computing in Science & Engineering.

[4]  Vadim Karpusenko coprocessors with a basic N-body simulation , 2013 .

[5]  Henk Corporaal,et al.  GPU-CC: a reconfigurable GPU architecture with communicating cores , 2013, M-SCOPES.

[6]  Xingjian Li,et al.  Floating-point mixed-radix FFT core generation for FPGA and comparison with GPU and CPU , 2011, 2011 International Conference on Field-Programmable Technology.

[7]  Mário P. Véstias,et al.  Analysis of matrix multiplication on high density Virtex-7 FPGA , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[8]  NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2008, Noordwijk, The Netherlands, June 22-25, 2008 , 2008, AHS.

[9]  Wayne Luk,et al.  A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation , 2009, FPGA '09.

[10]  Satoru Yamamoto,et al.  Scalability analysis of tightly-coupled FPGA-cluster for lattice Boltzmann computation , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[11]  Kiran Kumar Matam,et al.  Evaluating energy efficiency of floating point matrix multiplication on FPGAs , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[12]  Khaled Benkrid,et al.  High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU , 2010, TRETS.

[13]  Makoto Taiji,et al.  A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation , 2009, 2009 NASA/ESA Conference on Adaptive Hardware and Systems.

[14]  Roberto Capuzzo-Dolcetta,et al.  A performance comparison of different graphics processing units running direct NN-body simulations , 2013, Comput. Phys. Commun..

[15]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[16]  Devu Manikantan Shila,et al.  High throughput implementations of cryptography algorithms on GPU and FPGA , 2013, 2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[17]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Edward A. Lee,et al.  The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View , 2008 .

[19]  Osamu Takahashi,et al.  Migration of Cell Broadband Engine from 65nm SOI to 45nm SOI , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[20]  Gregory D. Peterson,et al.  Performance Comparison of Cholesky Decomposition on GPUs and FPGAs , 2011 .

[21]  Peter Bailey,et al.  Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors , 2009, 2009 International Conference on Parallel Processing.