When FPGA Meets Cloud: A First Look at Performance

Cloud service providers promote their new field programmable gate array (FPGA) infrastructure as a service (IaaS) as the new era of cloud product. This FPGA IaaS wraps virtualized compute resources with FPGA boards, e.g., Amazon AWS F1, and reserves acceleration capability for specific applications. Though this acceleration technique sounds promising, questions like real world performance, best-fit scenarios, portability, etc., still need further clarification. In this paper, we present one of the first few empirical studies that take a close look at FPGA clouds from the tenants’ perspective. We have conducted measurement studies on Amazon AWS, Alibaba, and Huawei clouds for over one year. The experimental results show that: (1) Tenants experience severe performance-cost imbalance on FPGA IaaS platforms; (2) The inter-communication performance in FPGA clouds is tightly constrained by hardware drivers, e.g., small optimization of DMA drivers for PCIe can harvest significant performance gain; (3) The virtualized FPGA clouds are far from mature, e.g., small-sized jobs can greatly degrade the performance of FPGA clouds due to underutilized PCIe bandwidth. Our study not only provides useful hints to help tenants with FPGA service selection, but also sheds some lights for cloud providers to improve the performance of FPGA clouds.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Ming Mao,et al.  A Performance Study on the VM Startup Time in the Cloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[3]  Alberto Leon-Garcia,et al.  FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[4]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[5]  Jason Cong,et al.  When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[6]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[7]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Andrew W. Moore,et al.  Understanding PCIe performance for end host networking , 2018, SIGCOMM.

[9]  Jason Cong,et al.  Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale , 2016, SoCC.

[10]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[11]  Y.M. Alkabani,et al.  Hardware/Software Partitioning of a Bayesian Spam Filter via Hardware Profiling , 2006, 2006 IEEE International Symposium on Industrial Electronics.

[12]  Yuan Zhou,et al.  Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs , 2018, FPGA.

[13]  Andrew W. Moore,et al.  A PCIe DMA engine to support the virtualization of 40 Gbps FPGA-accelerated network appliances , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[14]  Ming Liu,et al.  Scalable multi-access flash store for big data analytics , 2014, FPGA.

[15]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[16]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[17]  John W. Lockwood,et al.  Implementing Ultra Low Latency Data Center Services with Programmable Logic , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[18]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19]  Hong Xu,et al.  DHL: Enabling Flexible Software Network Functions with FPGA Acceleration , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[20]  Tim Good,et al.  AES on FPGA from the Fastest to the Smallest , 2005, CHES.

[21]  Hadi Esmaeilzadeh,et al.  TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[22]  Christophe Bobda,et al.  A System on FPGA for Fast Handwritten Digit Recognition in Embedded Smart Cameras , 2017, ICDSC.

[23]  Paolo Ienne,et al.  Virtualized Execution Runtime for FPGA Accelerators in the Cloud , 2017, IEEE Access.

[24]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[25]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[26]  Frans Coenen,et al.  FCNN: Fourier Convolutional Neural Networks , 2017, ECML/PKDD.

[27]  Dah-Jye Lee,et al.  FPGA-based Real-time Optical Flow Algorithm Design and Implementation , 2007, J. Multim..

[28]  Hari Angepat,et al.  A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[30]  Yu Zhang,et al.  Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.