Massively parallel programming models used as hardware description languages: The OpenCL case

The problem of automatically generating hardware modules from high level application representations has been at the forefront of EDA research during the last few years. In this paper, we introduce a methodology to automatically synthesize hardware accelerators from OpenCL applications. OpenCL is a recent industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our methodology maps OpenCL kernels into hardware accelerators, based on architectural templates that explicitly decouple computation from memory communication whenever this is possible. The templates can be tuned to provide a wide repertoire of accelerators that meet user performance requirements and FPGA device characteristics. Furthermore, a set of high- and low-level compiler optimizations is applied to generate optimized accelerators. Our experimental evaluation shows that the generated accelerators are tuned efficiently to match the applications memory access pattern and computational complexity, and to achieve user performance requirements. An important objective of our tool is to expand the FPGA development user base to software engineers, thereby expanding the scope of FPGAs beyond the realm of hardware design.

[1]  Kevin Skadron,et al.  A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[2]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[4]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[6]  Ed F. Deprettere,et al.  Laura: Leiden Architecture Research and Exploration Tool , 2003, FPL.

[7]  Jason Cong,et al.  AutoPilot: A Platform-Based ESL Synthesis System , 2008 .

[8]  Jarmo Takala,et al.  OpenCL-based design methodology for application-specific processors , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[9]  Feng Yi,et al.  Overview of AVS-video: tools, performance and complexity , 2005, Visual Communications and Image Processing.

[10]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[11]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[12]  Mark Horowitz,et al.  Chip Multi-Processor Generator , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[13]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[14]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[15]  John Wawrzynek,et al.  OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[16]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[17]  Maya Gokhale,et al.  Trident: From High-Level Language to Hardware Circuitry , 2007, Computer.

[18]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[19]  Christos D. Antonopoulos,et al.  GLOpenCL: OpenCL support on hardware- and software-managed cache multicores , 2011, HiPEAC.