A Hypervisor for Shared-Memory FPGA Platforms

Cloud providers widely deploy FPGAs as application-specific accelerators for customer use. These providers seek to multiplex their FPGAs among customers via virtualization, thereby reducing running costs. Unfortunately, most virtualization support is confined to FPGAs that expose a restrictive, host-centric programming model in which accelerators cannot issue direct memory accesses (DMAs). The host-centric model incurs high runtime overhead for workloads that exhibit pointer chasing. Thus, FPGAs are beginning to support a shared-memory programming model in which accelerators can issue DMAs. However, virtualization support for shared-memory FPGAs is limited. This paper presents Optimus, the first hypervisor that supports scalable shared-memory FPGA virtualization. Optimus offers both spatial multiplexing and temporal multiplexing to provide efficient and flexible sharing of each accelerator on an FPGA. To share the FPGA-CPU interconnect at a high clock frequency, Optimus implements a multiplexer tree. To isolate each guest's address space, Optimus introduces the technique of page table slicing as a hardware-software co-design. To support preemptive temporal multiplexing, Optimus provides an accelerator preemption interface. We show that Optimus supports eight physical accelerators on a single FPGA and improves the aggregate throughput of twelve real-world benchmarks by 1.98x-7x.

[1]  Gustavo Alonso,et al.  Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures , 2017, SIGMOD Conference.

[2]  Gustavo Alonso,et al.  Centaur: A Framework for Hybrid CPU-FPGA Databases , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[3]  Alberto Leon-Garcia,et al.  FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[4]  Hari Angepat,et al.  A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Jason Cong,et al.  High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[6]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Kizheppatt Vipin,et al.  Virtualized FPGA Accelerators for Efficient Cloud Computing , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[8]  Lu Zhang,et al.  Moonwalk: NRE Optimization in ASIC Clouds , 2017, ASPLOS.

[9]  Yaozu Dong,et al.  A Full GPU Virtualization Solution with Mediated Pass-Through , 2014, USENIX Annual Technical Conference.

[10]  Jürgen Becker,et al.  Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI express single-root I/O virtualization , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[11]  Viktor K. Prasanna,et al.  Fast generation of high throughput customized deep learning accelerators on FPGAs , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[12]  Guy Lemieux,et al.  An efficient FPGA overlay for portable custom instruction set extensions , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[13]  Robert W. Brodersen,et al.  A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[14]  Sanjay Kumar,et al.  Virtual WiFi: bring virtualization from wired to wireless , 2011, VEE '11.

[15]  Daniel Raho,et al.  vFPGAmanager: A Virtualization Framework for Orchestrated FPGA Accelerator Sharing in 5G Cloud Environments , 2018, 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB).

[16]  David W. Nellans,et al.  Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Eric Schkufza,et al.  Just-In-Time Compilation for Verilog: A New Technique for Improving the FPGA Programming Experience , 2019, ASPLOS.

[18]  Daniel Raho,et al.  FPGA virtualization with accelerators overcommitment for network function virtualization , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[19]  Robert W. Brodersen,et al.  Borph: an operating system for fpga-based reconfigurable computers , 2007 .

[20]  Alberto Leon-Garcia,et al.  Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center , 2017, FPGA.

[21]  Guido Araujo,et al.  Automatic Offloading of Cluster Accelerators , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[22]  Andreas Herkersdorf,et al.  Enabling FPGAs in Hyperscale Data Centers , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[23]  Jayaram Bhasker,et al.  A VHDL primer , 1995 .

[24]  Marco Platzner,et al.  ReconOS: Multithreaded programming for reconfigurable computers , 2009, TECS.

[25]  Gil Neiger,et al.  Intel ® Virtualization Technology for Directed I/O , 2006 .

[26]  Rainer G. Spallek,et al.  RC3E: Provision and Management of Reconfigurable Hardware Accelerators in a Cloud Environment , 2015, ArXiv.

[27]  Wei Wang,et al.  pvFPGA: Accessing an FPGA-based hardware accelerator in a paravirtualized environment , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[28]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[29]  Alex Delis,et al.  MEGA: overcoming traditional problems with OS huge page management , 2019, SYSTOR.

[30]  Douglas L. Maskell,et al.  Efficient Overlay Architecture Based on DSP Blocks , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[31]  Kermin Fleming,et al.  The LEAP FPGA operating system , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[32]  Simon See,et al.  An Evaluation of Unified Memory Technology on NVIDIA GPUs , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[33]  Eric Schkufza,et al.  Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS , 2018, OSDI.

[34]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[35]  Viktor K. Prasanna,et al.  Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[36]  Rachata Ausavarungnirun,et al.  Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors , 2018, OPSR.

[37]  Dirk Koch,et al.  A Survey on FPGA Virtualization , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[38]  Phillip H. Jones,et al.  CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[39]  Yongqiang Xiong,et al.  The Feniks FPGA Operating System for Cloud Computing , 2017, APSys.

[40]  Paolo Ienne,et al.  Virtualized Execution Runtime for FPGA Accelerators in the Cloud , 2017, IEEE Access.

[41]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[42]  Youngjin Kwon,et al.  Ingens: Huge Page Support for the OS and Hypervisor , 2017, OPSR.

[43]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[44]  Yu Zhang,et al.  Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.

[45]  James R. Larus,et al.  A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[46]  James C. Hoe,et al.  A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems , 2016, FPGA.

[47]  James C. Hoe,et al.  CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.

[48]  Guy Lemieux,et al.  ZUMA: An Open FPGA Overlay Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[49]  Asit K. Mishra,et al.  From High-Level Deep Network Models to FPGA Acceleration , 2016 .

[50]  Youngjin Kwon,et al.  Coordinated and Efficient Huge Page Management with Ingens , 2016, OSDI.

[51]  Douglas L. Maskell,et al.  Throughput oriented FPGA overlays using DSP blocks , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[52]  Donald E. Thomas,et al.  The Verilog® Hardware Description Language , 1990 .

[53]  Christopher J. Rossbach,et al.  Automatic Virtualization of Accelerators , 2019, HotOS.

[54]  James C. Hoe,et al.  CoRAM++: Supporting data-structure-specific memory interfaces for FPGA computing , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[55]  Jim Stevens,et al.  Hthreads: A Computational Model for Reconfigurable Devices , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[56]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.