High Performance in the Cloud with FPGA Groups

Field-programmable gate arrays (FPGAs) can offer invaluable computational performance for many compute-intensive algorithms. However, to justify their purchase and administration costs it is necessary to maximize resource utilization over their expected lifetime. Making FPGAs available in a cloud environment would make them attractive to new types of users and applications and help democratize this increasingly popular technology. However, there currently exists no satisfactory technique for offering FPGAs as cloud resources and sharing them between multiple tenants. We propose FPGA groups, which are seen by their clients as a single virtual FPGA, and which aggregate the computational power of multiple physical FPGAs. FPGA groups are elastic, and they may be shared among multiple tenants. We present an autoscaling algorithm to maximize FPGA groups' resource utilization and reduce user-perceived computation latencies. FPGA groups incur a low overhead in the order of 0.09ms per submitted task. When faced with a challenging workload, the autoscaling algorithm increases resource utilization from 52% to 61% compared to a static resource allocation, while reducing task execution latencies by 61%.

[1]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[2]  Tetsu Narumi,et al.  Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability , 2012 .

[3]  Sergei Gorlatch,et al.  dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Kenli Li,et al.  vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[5]  Wayne Luk,et al.  Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment , 2015, FPGA.

[6]  Yu Zhang,et al.  Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.

[7]  Shinpei Kato,et al.  GPUvm: Why Not Virtualizing GPUs at the Hypervisor? , 2014, USENIX Annual Technical Conference.

[8]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[9]  Alessandro Forin,et al.  Where's the Beef? Why FPGAs Are So Fast , 2008 .

[10]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[11]  Wayne Luk,et al.  Aspect driven compilation for dataflow designs , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[12]  References , 1971 .

[13]  Alberto Leon-Garcia,et al.  FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[14]  Geoffrey C. Fox,et al.  GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[15]  Wayne Luk,et al.  HARNESS Project: Managing Heterogeneous Computing Resources for a Cloud Platform , 2014, ARC.

[16]  Carlos Reaño,et al.  Influence of InfiniBand FDR on the performance of remote GPU virtualization , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[17]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[18]  Dhabaleswar K. Panda,et al.  Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.

[19]  Federico Silla,et al.  rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.

[20]  Mathias Gottschlag,et al.  LoGV: Low-Overhead GPGPU Virtualization , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[21]  David F. Bacon,et al.  FPGA programming for the masses , 2013, CACM.

[22]  Roger Woods,et al.  FPGA-based Implementation of Signal Processing Systems , 2017 .

[23]  Wei Wang,et al.  pvFPGA: Accessing an FPGA-based hardware accelerator in a paravirtualized environment , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[24]  David F. Bacon,et al.  FPGA Programming for the Masses , 2013, ACM Queue.

[25]  Pedro C. Diniz,et al.  Compilation Techniques for Reconfigurable Architectures , 2008 .

[26]  A. White,et al.  The VirtualCL ( VCL ) Cluster Platform , 2013 .