A DSL compiler for accelerating image processing pipelines on FPGAs

This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-wise, stencil, or other more complex operations on image pixels. Recent efforts have led to the development of domain-specific languages (DSL) and optimization frameworks for image processing pipelines. In this paper, we develop an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs. Our approach exploits reuse and available memory bandwidth (or chip resources) maximally. When compared to Darkroom, a state-of-the-art approach to compile high-level DSL to FPGAs, our approach (a) leads to designs that deliver significantly higher throughput, and (b) supports a greater variety of filters. Furthermore, the designs we generate obtain an improvement even over pre-optimized FPGA implementations provided by vendor libraries for some of the benchmarks.

[1]  Jason Cong,et al.  Efficient compilation of CUDA kernels for high-performance computing on FPGAs , 2013, TECS.

[2]  Scott A. Mahlke,et al.  Optimus: efficient realization of streaming applications on FPGAs , 2008, CASES '08.

[3]  Vinod Grover,et al.  Forma: a DSL for image processing applications to target GPUs and multi-core CPUs , 2015, GPGPU@PPoPP.

[4]  Ioana Burcea,et al.  A compiler and runtime for heterogeneous computing , 2012, DAC Design Automation Conference 2012.

[5]  Xing Zhou,et al.  Hierarchical overlapped tiling , 2012, CGO '12.

[6]  Chris Spear SystemVerilog for Verification, Second Edition: A Guide to Learning the Testbench Language Features , 2008 .

[7]  Frédéric Vivien,et al.  A constructive solution to the juggling problem in processor array synthesis , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[8]  Uday Bondhugula,et al.  PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.

[9]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[10]  David F. Bacon,et al.  FPGA programming for the masses , 2013, CACM.

[11]  Preeti Ranjan Panda,et al.  SystemC - a modeling platform supporting multiple design abstractions , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[12]  Pedro C. Diniz,et al.  A compiler approach to fast hardware design space exploration in FPGA-based systems , 2002, PLDI '02.

[13]  Maria,et al.  How Was the Movie , 2014 .

[14]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[15]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[16]  Jürgen Teich,et al.  HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.

[17]  Bruce A. Draper,et al.  High-Level Language Abstraction for Reconfigurable Computing , 2003, Computer.

[18]  Uday Bondhugula,et al.  Effective automatic parallelization of stencil computations , 2007, PLDI '07.

[19]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[20]  Jason Cong,et al.  Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.

[21]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[22]  Pedro C. Diniz,et al.  Bridging the Gap between Compilation and Synthesis in the DEFACTO System , 2001, LCPC.

[23]  Uday Bondhugula,et al.  Automatic mapping of nested loops to FPGAS , 2007, PPoPP.

[24]  Arvind,et al.  What is Bluespec? , 2008, SIGD.

[25]  Christian B. Spear,et al.  SystemVerilog for Verification: A Guide to Learning the Testbench Language Features , 2007 .

[26]  Jürgen Teich,et al.  Code generation from a domain-specific language for C-based HLS of hardware accelerators , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[27]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[28]  Weng-Fai Wong,et al.  A computing origami: Folding streams in FPGAs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[29]  Muhsen Owaida,et al.  Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[30]  Brian MacCleery,et al.  Motorcycle control prototyping using an FPGA-based embedded control system , 2006 .

[31]  Pedro C. Diniz,et al.  Compilation Techniques for Reconfigurable Architectures , 2008 .

[32]  P. Sadayappan,et al.  High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.

[33]  Alain Darte,et al.  Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA , 2012, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34]  Walid A. Najjar,et al.  Efficient hardware code generation for FPGAs , 2008, TACO.