FBLAS: Streaming Linear Algebra Kernels on FPGA

Reconfigurable hardware represents an attractive alternative to load-store architectures, as it allows eliminating expensive control and data movement overheads in computations. In practice, these devices are often not considered in the highperformance computing community, due to the steep learning curve and low productivity of hardware design, and the lack of available library support for fundamental operations. We present FBLAS, an open source implementation of Basic Linear Algebra Subroutines (BLAS) for FPGAs. The library is implemented with a modern HLS tool to promote productivity, reusability, and maintainability. Numerical routines are designed to be easily composed exploiting on-chip connections, to reduce off-chip communication resulting in lower communication volume.