FPGA-based accelerator design from a domain-specific language

A large portion of image processing applications often come with stringent requirements regarding performance, energy efficiency, and power. FPGAs have proven to be among the most suitable architectures for algorithms that can be processed in a streaming pipeline. Yet, designing imaging systems for FPGAs remains a very time consuming task. High-Level Synthesis, which has significantly improved due to recent advancements, promises to overcome this obstacle. In particular, Altera OpenCL is a handy solution for employing an FPGA in a heterogeneous system as it covers all device communication. However, to obtain efficient hardware implementations, extreme code modifications, contradicting OpenCL's data-parallel programming paradigm, are necessary. In this work, we explore the programming methodology that yields significantly better hardware implementations for the Altera Offline Compiler. We furthermore designed a compiler back end for a domain-specific source-to-source compiler to leverage the algorithm description to a higher level and generate highly optimized OpenCL code. Moreover, we advanced the compiler to support arbitrary bit width operations, which are fundamental to hardware designs. We evaluate our approach by discussing the resulting implementations throughout an extensive application set and comparing them with example designs, provided by Altera. In addition, as we can derive multiple implementations for completely different target platforms from the same domain-specific language source code, we present a comparison of the achieved implementations in contrast to GPU implementations.

[1]  Fridtjof Stein,et al.  Efficient Computation of Optical Flow Using the Census Transform , 2004, DAGM-Symposium.

[2]  Jason Cong,et al.  Throughput-oriented kernel porting onto FPGAs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Jürgen Teich,et al.  Code generation from a domain-specific language for C-based HLS of hardware accelerators , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Jürgen Teich,et al.  Loop coarsening in C-based High-Level Synthesis , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[5]  Peter M. Athanas,et al.  Enabling development of OpenCL applications on FPGA platforms , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[6]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[7]  Jürgen Teich,et al.  HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.

[8]  Kunle Olukotun,et al.  Hardware system synthesis from Domain-Specific Languages , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[9]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  Jason Cong,et al.  AutoPilot: A Platform-Based ESL Synthesis System , 2008 .

[11]  Kunle Olukotun,et al.  Automatic support for multi-module parallelism from computational patterns , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[12]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[13]  Yun Liang,et al.  High-level synthesis of multiple dependent CUDA kernels on FPGA , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[14]  Muhsen Owaida,et al.  Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[15]  Peter M. Athanas,et al.  Inferring custom architectures from OpenCL , 2015, 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[16]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[17]  John Freeman,et al.  From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).