(AS)2: Accelerator synthesis using algorithmic skeletons for rapid design space exploration

Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS)2: a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels) enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs for the accelerators can be efficiently and rapidly synthesized. We show that (AS)2 is a promising direction for accelerator synthesis as it generates a pareto front of 8 design points in under half an hour for each of the three image processing kernels.

[1]  W. Marsden I and J , 2012 .

[2]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[3]  Jürgen Teich,et al.  An image processing library for C-based high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[5]  Stephen Neuendorffer,et al.  Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries , 2013 .

[6]  Danny Crookes,et al.  From application descriptions to hardware in seconds: a logic-based approach to bridging the gap , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Steven Swanson,et al.  QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Yun Liang,et al.  High-Level Synthesis: Productivity, Performance, and Software Constraints , 2012, J. Electr. Comput. Eng..

[9]  Yifan He,et al.  MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs , 2013, 2013 International Symposium on Rapid System Prototyping (RSP).

[10]  Ryan Kastner,et al.  Enabling FPGAs for the Masses , 2014, ArXiv.