Heterogeneous Computing Utilizing FPGAs

Heterogeneous computing plays an ever-increasing role in power-efficient, high-performance embedded systems for various data processing tasks, such as computer vision. One possibility to accelerate this kind of application is the usage of FPGAs as a co-processor for standard CPUs. Although hardware design is becoming easier by utilizing High-Level-Synthesis tools, the question of interfacing FPGAs and CPUs has yet to be completely solved. The Heterogeneous System Architecture (HSA) Foundation defines and publishes architecture neutral standards for heterogeneous systems and programming models. While compatible CPU, GPU and DSP designs exist, FPGA models have not been defined yet. This paper describes the IP library LibHSA, which greatly simplifies integration of domain specific FPGA acceleration into existing HSA compliant systems. It allows FPGA based accelerators to take immediate advantage of high-level language tool chains. Including user space memory access, low-latency task dispatch and other benefits of the HSA programming model. We will demonstrate LibHSA with a programmable image processor implementation on a Xilinx FPGA. The image processor supports low-level algorithms, e.g. Sobel, Median, Laplace, or Gaussian. Our results show that the LibHSA infrastructure greatly simplifies the effort integrating FPGAs and customized hardware into existing accelerator systems, runtimes and application software.

[1]  Andreas Koch,et al.  ffLink: A Lightweight High-Performance Open-Source PCI Express Gen3 Interface for Reconfigurable Accelerators , 2016, CARN.

[2]  Lisa T. Su “Architecting the future through heterogeneous computing” , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[3]  Constantine Bekas,et al.  NanoStreams: Codesigned microservers for edge analytics in real time , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[4]  Jinjun Xiong,et al.  Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era , 2017, IEEE Micro.

[5]  Jarmo Takala,et al.  OpenCL-based design methodology for application-specific processors , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[6]  Marc Reichenbach,et al.  A Generic VHDL Template for 2D Stencil Code Applications on FPGAs , 2012, 2012 IEEE 15th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.

[7]  Guy Lemieux,et al.  Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[8]  C. John Glossner,et al.  HSA-enabled DSPs and accelerators , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[9]  Marco D. Santambrogio,et al.  On How to Improve FPGA-Based Systems Design Productivity via SDAccel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Martin Margala,et al.  High level programming of FPGAs for HPC and data centric applications , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  David R. Kaeli,et al.  A comprehensive performance analysis of HSA and OpenCL 2.0 , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[12]  Mohammed A. S. Khalid,et al.  An overview of Altera SDK for OpenCL: A user perspective , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[13]  Yajun Ha,et al.  A heterogeneous platform with GPU and FPGA for power efficient high performance computing , 2014, 2014 International Symposium on Integrated Circuits (ISIC).

[14]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .