Overlays are virtual, re-configurable architectures that overlay on top of physical FPGA fabrics. An overlay that is specialized for an application, or a class of applications, offers both fast reconfiguration and minimized performance penalty. Such an overlay is usually implemented by hardware designers in hardware "assembly" languages at register-transfer level (RTL).
This short article proposes an idea for a software programmer, instead of hardware designers, to quickly implement an application-specific overlay using high-level customizable IPs. These IPs are expressed succinctly by a specification language, whose abstraction level is much higher than RTL but can nonetheless expresses many performance-critical loop and data optimizations on FPGAs, and thus would offer competitively high performance at a much lower cost of maintenance and much easier customizations.
We propose new language features to easily put the IPs together into an overlay. A compiler automatically implements the specified optimizations to generate an efficient overlay, exposes a multi-tasking programming interface for the overlay, and inserts a runtime scheduler for scheduling tasks to run on the IPs of the overlay, respecting the dependences between the tasks. While an application written in any language can take advantage of the overlay through the programming interface, we show a particular usage scenario, where the application itself is also succinctly specified in the same language.
We describe the new language features for expressing overlays, and illustrate the features with an LU decomposer and a convolutional neural network. A system is under construction to implement the language features and workloads.
[1]
Nitish Srivastava,et al.
T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations
,
2019,
2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[2]
Tarek S. Abdelrahman,et al.
Towards Synthesis-Free JIT Compilation to Commodity FPGAs
,
2011,
2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.
[3]
Thierry Moreau,et al.
A Hardware–Software Blueprint for Flexible Deep Learning Specialization
,
2018,
IEEE Micro.
[4]
Jason Cong,et al.
SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs
,
2020,
2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
[5]
James Coole,et al.
Fast, Flexible High-Level Synthesis from OpenCL using Reconfiguration Contexts
,
2014,
IEEE Micro.
[6]
Andrew C. Ling,et al.
An OpenCL™ Deep Learning Accelerator on Arria 10
,
2017,
FPGA.
[7]
Hongbo Rong,et al.
Programmatic Control of a Compiler for Generating High-performance Spatial Hardware
,
2017,
ArXiv.
[8]
Hayden Kwok-Hay So,et al.
FPGA Overlays
,
2016,
FPGAs for Software Programmers.
[9]
Eric S. Chung,et al.
A Configurable Cloud-Scale DNN Processor for Real-Time AI
,
2018,
2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).