CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures

This paper describes CHiMPS, a C-based accelerator compiler for hybrid CPU-FPGA computing platforms. CHiMPSpsilas goal is to facilitate FPGA programming for high-performance computing developers. It inputs generic ANSIC code and automatically generates VHDL blocks for an FPGA. The accelerator architecture is customized with multiple caches that are tuned to the application. Speedups of 2.8x to 36.9x (geometric mean 6.7x) are achieved on a variety of HPC benchmarks with minimal source code changes.

[1]  C.E. Stroud,et al.  Behavioral model synthesis with Cones , 1988, IEEE Design & Test of Computers.

[2]  Giovanni De Micheli,et al.  Hardware C - A Language for Hardware Design , 1988 .

[3]  David C. Ku,et al.  HardwareC -- A Language for Hardware Design (Version 2.0) , 1990 .

[4]  Kees van Berkel,et al.  Handshake Circuits: An Asynchronous Architecture for VLSI Programming , 1993 .

[5]  David R. Galloway The Transmogrifier C hardware description language and compiler for FPGAs , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[6]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[7]  Donald Soderman,et al.  Implementing C algorithms in reconfigurable hardware using C2Verilog , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[8]  Patrick Schaumont,et al.  A programming environment for the design of complex high speed ASICs , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[9]  Carl Ebeling,et al.  Specifying and compiling applications for RaPiD , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[10]  Maya Gokhale,et al.  NAPA C: compiling for a hybrid RISC/FPGA architecture , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[11]  Yanbing Li,et al.  Hardware-software co-design of embedded reconfigurable architectures , 2000, DAC.

[12]  Daniel D. Gajski,et al.  The Specc Methodology , 2000 .

[13]  Dominique Lavenier,et al.  Evaluation of the streams-C C-to-FPGA compiler: an applications perspective , 2001, FPGA '01.

[14]  Koichi Nishida,et al.  A C-based synthesis system, Bach, and its application (invited talk) , 2001, ASP-DAC '01.

[15]  Seth Copen Goldstein,et al.  Compiling Application-Specific Hardware , 2002, FPL.

[16]  Stephen A. Edwards,et al.  High-Level Synthesis from the Synchronous Language Esterel , 2002, IWLS.

[17]  Doug A. Edwards,et al.  Balsa: An Asynchronous Hardware Synthesis Language , 2002, Comput. J..

[18]  Thorsten Grotker,et al.  System Design with SystemC , 2002 .

[19]  David Ryan Koes,et al.  Programmer specified pointer independence , 2004, MSP '04.

[20]  John Teifel,et al.  Static tokens: using dataflow to automate concurrent pipeline synthesis , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[21]  Bruce A. Draper,et al.  Mapping a Single Assignment Programming Language to Reconfigurable Systems , 2002, The Journal of Supercomputing.

[22]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[23]  Stephen A. Edwards,et al.  The challenges of hardware synthesis from C-like languages , 2005, Design, Automation and Test in Europe.

[24]  Daniel S. Poznanovic Application development on the SRC Computers, Inc. systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[25]  Jason Cong,et al.  Platform-Based Behavior-Level and System-Level Synthesis , 2006, 2006 IEEE International SOC Conference.

[26]  Maya Gokhale,et al.  Promises and Pitfalls of Reconfigurable Supercomputing , 2006, ERSA.

[27]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[28]  Hans P. Zima,et al.  An Approach To Data Distributions in Chapel , 2007, Int. J. High Perform. Comput. Appl..

[29]  Steven Trimberger Redefining the FPGA for the Next Generation , 2007, FPL.

[30]  Ulrich Brüning,et al.  A versatile, low latency HyperTransport core , 2007, FPGA '07.

[31]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).