Accelerating DNNs from local to virtualized FPGA in the Cloud: A survey of trends

Abstract Field-programmable gate arrays (FPGAs) are widely used locally to speed up deep neural network (DNN) algorithms with high computational throughput and energy efficiency. Virtualizing FPGA and deploying FPGAs in the cloud are becoming increasingly attractive methods for DNN acceleration because they can enhance the computing ability to achieve on-demand acceleration across multiple users. In the past five years, researchers have extensively investigated various directions of FPGA-based DNN accelerators, such as algorithm optimization, architecture exploration, capacity improvement, resource sharing, and cloud construction. However, previous DNN accelerator surveys mainly focused on optimizing the DNN performance on a local FPGA, ignoring the trend of placing DNN accelerators in the cloud’s FPGA. In this study, we conducted an in-depth investigation of the technologies used in FPGA-based DNN accelerators, including but not limited to architectural design, optimization strategies, virtualization technologies, and cloud services. Additionally, we studied the evolution of DNN accelerators, e.g., from a single DNN to framework-generated DNNs, from physical to virtualized FPGAs, from local to the cloud, and from single-user to multi-tenant. We also identified significant obstacles for DNN acceleration in the cloud. This article enhances the current understanding of the evolution of FPGA-based DNN accelerators.

[1]  Hsin-Yu Ting,et al.  Dynamic Sharing in Multi-accelerators of Neural Networks on an FPGA Edge Device , 2020, 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[2]  Christos-Savvas Bouganis,et al.  f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Zhengwei Qi,et al.  A Hypervisor for Shared-Memory FPGA Platforms , 2020, ASPLOS.

[5]  Nong Xiao,et al.  An Efficient Mapping Approach to Large-Scale DNNs on Multi-FPGA Architectures , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Paolo Ienne,et al.  Virtualized Execution Runtime for FPGA Accelerators in the Cloud , 2017, IEEE Access.

[7]  Chen Yang,et al.  FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[8]  Iakovos S. Venieris,et al.  How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures , 2021, 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[9]  Li Tian,et al.  Designing efficient accelerator of depthwise separable convolutional neural network on FPGA , 2019, J. Syst. Archit..

[10]  Dirk Koch,et al.  Invited Tutorial: FPGA Hardware Security for Datacenters and Beyond , 2020, FPGA.

[11]  Yun Liang,et al.  Fune: An FPGA Tuning Framework for CNN Acceleration , 2020, IEEE Design & Test.

[12]  Max Welling,et al.  Relaxed Quantization for Discretized Neural Networks , 2018, ICLR.

[13]  Shengen Yan,et al.  Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Ali Kashif Bashir,et al.  Revisiting the High-Performance Reconfigurable Computing for Future Datacenters , 2020, Future Internet.

[15]  Yu Wang,et al.  Enable Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud , 2020, FPGA.

[16]  Jingfei Jiang,et al.  An FPGA-based accelerator implementation for deep convolutional neural networks , 2015, 2015 4th International Conference on Computer Science and Network Technology (ICCSNT).

[17]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[18]  Oussama Djedidi,et al.  Power profiling and monitoring in embedded systems: A comparative study and a novel methodology based on NARX neural networks , 2020, J. Syst. Archit..

[19]  Peter Zipf,et al.  Unrolling Ternary Neural Networks , 2019, ACM Trans. Reconfigurable Technol. Syst..

[20]  Jason Cong,et al.  Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Luciano Lavagno,et al.  CNN-on-AWS: Efficient Allocation of Multikernel Applications on Multi-FPGA Platforms , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Bo Liu,et al.  FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10 , 2020, FPGA.

[23]  Yu Cao,et al.  Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[24]  Christos-Savvas Bouganis,et al.  Toolflows for Mapping Convolutional Neural Networks on FPGAs , 2018, ACM Comput. Surv..

[25]  Yu Cao,et al.  An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[26]  Deming Chen,et al.  High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[27]  Junzhong Shen,et al.  Scale-out Acceleration for 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[28]  Kizheppatt Vipin,et al.  FPGA Dynamic and Partial Reconfiguration , 2018, ACM Comput. Surv..

[29]  Kejiang Ye,et al.  Imbalance in the cloud: An analysis on Alibaba cluster trace , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[30]  Scott Klein Azure Machine Learning , 2017 .

[31]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[32]  Zhiru Zhang,et al.  A Parallel Bandit-Based Approach for Autotuning FPGA Compilation , 2017, FPGA.

[33]  Kyandoghere Kyamakya,et al.  CNN based high performance computing for real time image processing on GPU , 2011 .

[34]  Jason Cong,et al.  Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.

[35]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[36]  Yu Cao,et al.  ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler , 2018, Integr..

[37]  Ang Li,et al.  CQNN: a CGRA-based QNN Framework , 2020, 2020 IEEE High Performance Extreme Computing Conference (HPEC).

[38]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[39]  Alessandro Aimar,et al.  NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Qi Li,et al.  Talk to me: Exploring user interactions with the Amazon Alexa , 2019, J. Libr. Inf. Sci..

[41]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[42]  Christos-Savvas Bouganis,et al.  fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[43]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[44]  Yu Wang,et al.  DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[45]  Viktor K. Prasanna,et al.  A Framework for Generating High Throughput CNN Implementations on FPGAs , 2018, FPGA.

[46]  Dirk Koch,et al.  A Survey on FPGA Virtualization , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[47]  Kenneth O'Brien,et al.  FINN-R , 2018, ACM Trans. Reconfigurable Technol. Syst..

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Dirk Koch,et al.  Resource Elastic Virtualization for FPGAs Using OpenCL , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[50]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[51]  Douglas L. Maskell,et al.  High Throughput Accelerator Interface Framework for a Linear Time-Multiplexed FPGA Overlay , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[52]  Jason Helge Anderson,et al.  Architecture Exploration of Standard-Cell and FPGA-Overlay CGRAs Using the Open-Source CGRA-ME Framework , 2018, ISPD.

[53]  Peter Y. K. Cheung,et al.  LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference , 2020, IEEE Transactions on Computers.

[54]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[55]  Peter Sanders,et al.  High Performance in the Cloud with FPGA Groups , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Fengbo Ren,et al.  A Survey of System Architectures and Techniques for FPGA Virtualization , 2021, IEEE Transactions on Parallel and Distributed Systems.

[58]  WangYu,et al.  [DL] A Survey of FPGA-based Neural Network Inference Accelerators , 2019 .

[59]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[60]  Jean-Paul Jamont,et al.  From FPGA to Support Cloud to Cloud of FPGA: State of the Art , 2019, Int. J. Reconfigurable Comput..

[61]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[62]  Yang Hu,et al.  Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[63]  Douglas L. Maskell,et al.  Time-Multiplexed FPGA Overlay Architectures , 2019, ACM Trans. Design Autom. Electr. Syst..

[64]  David A. Patterson,et al.  A domain-specific architecture for deep neural networks , 2018, Commun. ACM.

[65]  Eric Schkufza,et al.  Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS , 2018, OSDI.

[66]  Christophe Bobda,et al.  An embedded system for handwritten digit recognition , 2015, J. Syst. Archit..

[67]  Shawki Areibi,et al.  Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[68]  Kermin Fleming,et al.  The LEAP FPGA Operating System , 2016, FPGAs for Software Programmers.

[69]  Sparsh Mittal,et al.  A survey of FPGA-based accelerators for convolutional neural networks , 2018, Neural Computing and Applications.

[70]  Min Zhang,et al.  Optimized Compression for Implementing Convolutional Neural Networks on FPGA , 2019, Electronics.

[71]  Kiyoung Choi,et al.  Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[72]  Paolo Napoletano,et al.  Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.

[73]  Kyandoghere Kyamakya,et al.  CNN based high performance computing for real time image processing on GPU , 2011, Proceedings of the Joint INDS'11 & ISTET'11.

[74]  Marcelo A. C. Fernandes,et al.  A Survey and Taxonomy of FPGA-based Deep Learning Accelerators , 2019, J. Syst. Archit..

[75]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[76]  Marco Platzner,et al.  ReconOS: An Operating System Approach for Reconfigurable Computing , 2014, IEEE Micro.

[77]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[78]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[79]  Luciano Lavagno,et al.  Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA Platforms , 2020, IEEE Transactions on Circuits and Systems II: Express Briefs.

[80]  Yiyu Shi,et al.  Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference , 2019, ACM Trans. Embed. Comput. Syst..

[81]  Tasneem A. Awaad,et al.  Privacy attacks against deep learning models and their countermeasures , 2020, J. Syst. Archit..

[82]  Tudor Dumitras,et al.  Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks , 2019, USENIX Security Symposium.

[83]  Yue Zha,et al.  Virtualizing FPGAs in the Cloud , 2020, ASPLOS.

[84]  Mohamed S. Abdelfattah,et al.  DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[85]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[86]  Ching-Te Chiu,et al.  ESSA: An energy-Aware bit-Serial streaming deep convolutional neural network accelerator , 2020, J. Syst. Archit..

[87]  Smruti R. Sarangi,et al.  Accelerating CNN Inference on ASICs: A Survey , 2020, J. Syst. Archit..

[88]  M. Pelcat,et al.  Tactics to Directly Map CNN Graphs on Embedded FPGAs , 2017, IEEE Embedded Systems Letters.

[89]  Hal Hodson,et al.  Google DeepMind and healthcare in an age of algorithms , 2017, Health and Technology.

[90]  Lizy Kurian John,et al.  Tensor Slices to the Rescue: Supercharging ML Acceleration on FPGAs , 2021, FPGA.

[91]  Yao Chen,et al.  Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.

[92]  Fan Yao,et al.  DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips , 2020, USENIX Security Symposium.

[93]  Edwin Hsing-Mean Sha,et al.  Heterogeneous FPGA-Based Cost-Optimal Design for Timing-Constrained CNNs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[94]  Vaughn Betz,et al.  The Costs of Confidentiality in Virtualized FPGAs , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[95]  Paul Chow,et al.  FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack , 2014, FCCM 2014.

[96]  Eugenio Culurciello,et al.  Snowflake: An efficient hardware accelerator for convolutional neural networks , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[97]  Luigi Raffo,et al.  Hardware design methodology using lightweight dataflow and its integration with low power techniques , 2017, J. Syst. Archit..

[98]  Zhe Xu,et al.  Binary convolutional neural network acceleration framework for rapid system prototyping , 2020, J. Syst. Archit..

[99]  Eric S. Chung,et al.  A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[100]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[101]  Rastislav J. R. Struharik,et al.  CoNNa-Hardware accelerator for compressed convolutional neural networks , 2020, Microprocess. Microsystems.

[102]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[103]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[104]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[105]  Marco D. Santambrogio,et al.  A Framework with Cloud Integration for CNN Acceleration on FPGA Devices , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[106]  Jian Wang,et al.  A survey of FPGA design for AI era , 2020, Journal of Semiconductors.

[107]  Andreas Herkersdorf,et al.  Enabling FPGAs in Hyperscale Data Centers , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[108]  Kunle Olukotun,et al.  TensorFlow to Cloud FPGAs: Tradeoffs for Accelerating Deep Neural Networks , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[109]  Tajana Simunic,et al.  Workload-Aware Opportunistic Energy Efficiency in Multi-FPGA Platforms , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).