Pass a pointer: Exploring shared virtual memory abstractions in OpenCL tools for FPGAs

Heterogeneous CPU-FPGA systems are gaining momentum in the embedded systems sector and in the data center market. While the programming abstractions for implementing the data transfer between CPU and FPGA (and vice versa) that are available in today's commercial programming tools are well-suited for certain types of applications, the CPU-FPGA communication for applications that share complex pointer-based data structures between the CPU and FPGA remains difficult to implement. This paper focuses on programming environments providing a virtual memory space that is shared between the host CPU and one (or potentially several) FPGA devices. One example of shared virtual memory (SVM) is defined by the recent OpenCL 2.0 standard. SVM allows the software and hardware portion of a hybrid application to share complex data structures seamlessly (and concurrently) by simply passing a pointer, which greatly eases programming heterogeneous systems. We present a framework that automatically adds the physical infrastructure for SVM into a commercial OpenCL tool for FPGAs. This paper explores the design space for these building blocks and studies the performance impact. We show that, due to the ability of SVM-enabled implementations to avoid artificially sizing dynamic data structures and fetching data on-the-fly, up to 2x speed-up over an OpenCL design without SVM support can be achieved. Our framework is open-source and publicly available.

[1]  Jason Cong,et al.  Supporting Address Translation for Accelerator-Centric Architectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]  George A. Constantinides,et al.  FPGA-based K-means clustering using tree-based data structures , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[3]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[4]  Andreas Koch,et al.  An Execution Model for Hardware/Software Compilation and its System-Level Realization , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[5]  Andreas Koch,et al.  Low-latency high-bandwidth HW/SW communication in a virtual memory environment , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[6]  Paul Chow,et al.  Evaluating shared virtual memory in an OpenCL framework for embedded systems on FPGAs , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[7]  Philip Garcia,et al.  A Reconfigurable Hardware Interface for a Modern Computing System , 2007 .

[8]  Marco Platzner,et al.  Memory Virtualization for Multithreaded Reconfigurable Hardware , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.