FlexGrip: A soft GPGPU for FPGAs

Over the past decade, soft microprocessors and vector processors have been extensively used in FPGAs for a wide variety of applications. However, it is difficult to straightforwardly extend their functionality to support conditional and thread-based execution characteristic of general-purpose graphics processing units (GPGPUs) without recompiling FPGA hardware for each application. In this paper, we describe the implementation of FlexGrip, a soft GPGPU architecture which has been optimized for FPGA implementation. This architecture supports direct CUDA compilation to a binary which is executable on the FPGA-based GPGPU without hardware recompilation. Our architecture is customizable, thus providing the FPGA designer with a selection of GPGPU cores which display performance versus area tradeoffs. The benefits of our architecture are evaluated for a collection of five standard CUDA benchmarks which are compiled using standard GPGPU compilation tools. Speedups of up to 30× versus a MicroBlaze microprocessor are achieved for designs which take advantage of the conditional execution capabilities offered by FlexGrip.

[1]  John W. Lockwood,et al.  Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs , 2004, FPL.

[2]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[3]  Michael J. Schulte,et al.  ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[4]  J. Gregory Steffan,et al.  A GPU-inspired soft processor for high-throughput acceleration , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[5]  Russell Tessier,et al.  Birth and adolescence of reconfigurable computing: a survey of the first 20 years of field-programmable custom computing machines , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[6]  Abdullah Al-Dujaili,et al.  Guppy: A GPU-like soft-core processor , 2012, 2012 International Conference on Field-Programmable Technology.

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Muhsen Owaida,et al.  Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[9]  Jonathan Rose,et al.  VESPA: portable, scalable, and flexible FPGA-based vector processors , 2008, CASES '08.

[10]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[11]  John Wawrzynek,et al.  MARC: A Many-Core Approach to Reconfigurable Computing , 2010, 2010 International Conference on Reconfigurable Computing and FPGAs.

[12]  Jean-Marc Delosme,et al.  Performance of a new annealing schedule , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[13]  James Coole,et al.  Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[14]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[15]  Jürgen Teich,et al.  ReCoBus-Builder — A novel tool and technique to build statically and dynamically reconfigurable systems for FPGAS , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[16]  John Wawrzynek,et al.  OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[17]  J. Gregory Steffan,et al.  Improving Pipelined Soft Processors with Multithreading , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[18]  Eduardo de la Torre,et al.  A Fast Emulation-Based NoC Prototyping Framework , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[19]  Brent E. Nelson,et al.  Impact of hard macro size on FPGA clock rate and place/route time , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[20]  Implementing FPGA Design with the OpenCL Standard , 2010 .

[21]  Guy Lemieux,et al.  VEGAS: soft vector processor with scratchpad memory , 2011, FPGA '11.

[22]  Stephen Dean Brown,et al.  A Multithreaded Soft Processor for SoPC Area Reduction , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[23]  Guy Lemieux,et al.  Accelerator compiler for the VENICE vector processor , 2012, FPGA '12.

[24]  Brent E. Nelson,et al.  HMFlow: Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[25]  Russell Tessier Fast placement approaches for FPGAs , 2002, TODE.