Compiler-Assisted Selection of Hardware Acceleration Candidates from Application Source Code

Hardware design is a difficult task. Beside ensuring functional correctness of an implementation, hardware developers are confronted with multiple and often conflicting constraints, such as performance and area cost targets, that require lengthy explorations. This issue is compounded when considering the acceleration of complex applications, of which some parts are implemented in software, and others are accelerated in hardware. Hardware/Software partitioning must be settled early in the development cycle, and is far from trivial, since at this stage detailed performance measurements are not available, while wrong choices can lead to vastly sub-optimal solutions or to wasted implementation efforts. To address this challenge, we present a framework for automatically identifying software segments that are promising candidates for hardware acceleration and to evaluate, from un-modified software code, the potential speedup and resource requirements. Our strategy is based on Intermediate Representation (IR) analysis passes, which we embed in the LLVM compiler toolchain, and does not require any time-consuming synthesis. We explore its effectiveness on the reference software implementation of a complex application, the H.264 Decoder from University of Illinois, and demonstrate that our methodology, for a user-defined resource constraint, effectively selects high-performance sets of accelerators.

[1]  Laura Pozzi,et al.  RegionSeeker: Automatically Identifying and Selecting Accelerators From Application Source Code , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Fabrizio Ferrandi,et al.  Bambu : A Free Framework for the High Level Synthesis of Complex Applications , 2012 .

[3]  Vinod Kathail,et al.  SDSoC: A Higher-level Programming Environment for Zynq SoC and Ultrascale+ MPSoC , 2016, FPGA.

[4]  Andreas Koch,et al.  Detecting Kernels Suitable for C-Based High-Level Hardware Synthesis , 2016, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld).

[5]  Luca P. Carloni,et al.  An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Gu-Yeon Wei,et al.  Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Yao Chen,et al.  High Level Synthesis of Complex Applications: An H.264 Video Decoder , 2016, FPGA.

[8]  Luca P. Carloni,et al.  On learning-based methods for design-space exploration with High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Kazutoshi Wakabayashi,et al.  Divide and conquer high-level synthesis design space exploration , 2012, TODE.

[11]  Jari Nurmi,et al.  HW/SW Co-design of an IEEE 802.11a/g Receiver on Xilinx Zynq SoC using High-Level Synthesis , 2017, HEART.

[12]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[13]  Laura Pozzi,et al.  Maximum Convex Subgraphs Under I/O Constraint for Automatic Identification of Custom Instructions , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Christian Enz,et al.  Hardware Acceleration of HDR-Image Tone Mapping on an FPGA-CPU Platform Through High-Level Synthesis , 2018, 2018 31st IEEE International System-on-Chip Conference (SOCC).

[15]  Laura Pozzi,et al.  Lattice-Traversing Design Space Exploration for High Level Synthesis , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[16]  Dirk Stroobandt,et al.  An overview of today’s high-level synthesis tools , 2012, Design Automation for Embedded Systems.

[17]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[19]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[20]  Marcel Gort,et al.  From software to accelerators with LegUp high-level synthesis , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[21]  Stephen Dean Brown,et al.  Use of CPU Performance Counters for Accelerator Selection in HLS-Generated CPU-Accelerator Systems , 2018, HEART.

[22]  Luca P. Carloni,et al.  COSMOS , 2017, ACM Trans. Embed. Comput. Syst..

[23]  Laura Pozzi,et al.  Machine Learning Approach for Loop Unrolling Factor Prediction in High Level Synthesis , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[24]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[25]  Laura Pozzi,et al.  Cluster-Based Heuristic for High Level Synthesis Design Space Exploration , 2018, IEEE Transactions on Emerging Topics in Computing.

[26]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).