Code generation from a domain-specific language for C-based HLS of hardware accelerators

As today's computer architectures are becoming more and more heterogeneous, a plethora of options including CPUs, GPUs, DSPs, reconfigurable logic (FPGAs), and other application-specific processors come into consideration for close-to-sensor processing. Especially, in the domain of image processing on mobile devices, among numerous design challenges, a very stringent energy budget is of utmost importance, making embedded GPUs and FPGAs ideal targets for implementation. Recently, the HIPAcc framework was proposed as a means for automatic code generation of image processing algorithms for embedded GPUs, based on a Domain-Specific Language (DSL). Despite of huge advancements in High-Level Synthesis (HLS) for FPGAs, designers are still required to have detailed knowledge about coding techniques and the targeted architecture to achieve efficient solutions. As a remedy, in this work, we propose code generation techniques for C-based HLS from a common high-level DSL description targeting FPGAs. Our approach includes FPGA-specific memory architectures for handling point and local operators, numerous high-level transformations, and automatic test bench generation. We evaluate our approach by comparing the resulting hardware accelerators to existing frameworks in terms of performance and resource requirements. Moreover, we assess the achieved energy efficiency in contrast to software implementations, generated by HIPAcc from the same code base, executed on GPUs.

[1]  Martin Odersky,et al.  Making domain-specific hardware synthesis tools cost-efficient , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[2]  Jürgen Teich,et al.  Domain-specific augmentations for High-Level Synthesis , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[3]  Jürgen Teich,et al.  Generating Device-specific GPU Code for Local Operators in Medical Imaging , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[4]  Adrian Park,et al.  Designing Modular Hardware Accelerators in C with ROCCC 2.0 , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[5]  Minh N. Do,et al.  Youn-Long Steve Lin , 1992 .

[6]  Jürgen Teich,et al.  An image processing library for C-based high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[7]  Fridtjof Stein,et al.  Efficient Computation of Optical Flow Using the Census Transform , 2004, DAGM-Symposium.

[8]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[9]  Jürgen Teich,et al.  Code generation for embedded heterogeneous architectures on android , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[11]  Jürgen Teich,et al.  Code Generation for High-Level Synthesis of Multiresolution Applications on FPGAs , 2014, ArXiv.

[12]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[13]  Jason Cong,et al.  AutoPilot: A Platform-Based ESL Synthesis System , 2008 .

[14]  Russell Tessier,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Reconfigurable Computing for Digital Signal Processing: A Survey ∗ , 1999 .

[15]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..